JACCARD_DISTANCE

JACCARD_DISTANCE(vector1, vector2);

Description

Computes the Jaccard Distance between two vectors. Jaccard distance is defined as 1 - Jaccard similarity coefficient, used to measure the degree of difference between two sets. For binary vectors, Jaccard distance = 1 - |A∩B| / |A∪B|, where A and B represent the sets of non-zero elements in the two vectors respectively.

Parameter Description

  • vector1: The first vector, supported type vector\<tinyint>
  • vector2: The second vector, supported type vector\<tinyint>

Return Result

Returns a double value in the range [0, 1]. 0 indicates the two vectors are identical, 1 indicates they are completely different.

Examples

  • Compute the Jaccard distance between two vector\<tinyint> vectors
SELECT JACCARD_DISTANCE(VECTOR(1y, 0y, 1y), VECTOR(1y, 1y, 0y)) as jaccard_dis;
+----------------------+
|     jaccard_dis      |
+----------------------+
| 0.6666666269302368   |
+----------------------+
  • Compute the Jaccard distance between longer tinyint vectors
SELECT JACCARD_DISTANCE(VECTOR(1y, 0y, 1y, 0y), VECTOR(1y, 0y, 0y, 1y)) as jaccard_dis;
+----------------------+
|     jaccard_dis      |
+----------------------+
| 0.6666666269302368   |
+----------------------+
  • Compute the Jaccard distance between identical vectors (result is 0)
SELECT JACCARD_DISTANCE(VECTOR(1y, 1y, 0y), VECTOR(1y, 1y, 0y)) as jaccard_dis;
+-------------+
| jaccard_dis |
+-------------+
| 0           |
+-------------+