# Similarity

`algo(jaccard)`

** Basic **
** Real-time **

Jaccard Similarity is also known as Intersection over Union. It basically measures the similarity between two finite samples sets, and is defined as the size of the intersection divided by the size of the union of these two sample sets. Obviously, this coefficient is a value between 0 and 1, the larger the coefficient, the higher the similarity.

Jaccard Similarity by Ulitpa Graph calculates the similarty between two nodes in terms of their neighborhood. As expressed below, *A* and *B* represent the neighbor sets of node *a* and node *b* respectively (deduplicated and excluding the subject node *a* and *b*), and the similarity is calculated as the nubmer of their common neighbors divided by the nubmer of all their neighbors.

Configuration items for Jaccard Similarity operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<ids1>` |
[]int | Ultipa ID | ID of node a, input multiple nodes for a batch computing |

`<ids2>` |
[]int | Ultipa ID | (Optional) ID of node b, input multiple nodes for a batch computing; nodes from `<ids1>` and `<ids2>` will be paired and calculated, or nodes from `<ids1>` will be paired with any different nodes from the graph if not configured |

`<limit>` |
int | >0; -1 | `<ids2>` configured: The maximum number of results to return; -1: return all the results `<ids2>` not configured: The maximum number of similar nodes to return for each node in group a; -1: return all the similar nodes for each node in group a |

`<order>` |
string | 'ASC' or 'DESC' | (Optional) To arrange the results in ascending or descending order, or leave them un-ordered if not configured |

Calculation results:

Item | Data Type | Range |
---|---|---|

the Jaccard similarity of node pairs | float | [0, 1] |

Validity of `write_back()`

:

Not supported.

Example 1: Calculate Jaccard Similarity between each pair of nodes from [1,2,3] and [4,5,6], return the top 5 results

```
algo(jaccard).params({ ids1: [1,2,3], ids2: [4,5,6], limit: 5, order: 'DESC' })
```

Example 2: For each nodes in [1,2,3], calculate the top 3 most similar nodes

```
algo(jaccard).params({ ids1: [1,2,3], limit: 3 })
```

`algo(cosine_similarity)`

** Basic **
** Real-time **

Cosine similarity, by definition, is a measure of similarity between two non-zero vectors of inner product space. In the context of a graph data set, a non-zero vector is a node represented by property values. Given two nodes *a* and *b* represented by properties *(a1,a2,a3...)* and *(b1,b2,b3...)*, their Cosine similarity is:

The result of computed cosine similarity ranges from 0 to 1, 1 means 100% similar, 0 means no-similarity at all.

Configuration items for Cosine Similarity operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<node_id1>` |
int | Ultipa ID | The node1 |

`<node_id2>` |
int | Ultipa ID | The node2 |

`<node_property_names>` |
[]string | comma (,) separated, at least two node properties of numeric type | The node properties (must be LTE first) to be used for calculation |

Calculation results:

Item | Data Type | Range |
---|---|---|

the Cosine similarity of the node pair | float | [0, 1] |

Validity of `write_back()`

:

Not supported.

Example: By using the 'salary' and 'age' properties of node 12 and node 21, calculate their cosine similarity:

```
algo(cosine_similarity).params({
node_id1: 12,
node_id2: 21,
node_property_names: ["salary", "age"]
})
```

To launch Cosine Similarity algorithm with Ultipa Manager, you simply need to go to Algos module and enter the ID of two nodes plus the node properties to be used for computing, the instant results are shown in the below screenshot:

*Figure: Cosine Similarity (Ultipa Manager)*

## No Comments