The Tanimoto similarity ST is computed according to the definition for bit vectors (see Jaccard index at Wikipedia). For weights \(W_i \in \{0, 1\}\) it is the same as the Jaccard similarity. The Tanimoto similarity can be computed for any term vectors, but for 1 - ST to be a proper distance metric satisfying the triangle inequality, \(M_{i,j} \in \{0, W_i\}\) must hold.

The Jaccard similarity is computed using the Tanimoto similarity definition for bit vectors (see Jaccard index at Wikipedia). For the results to be a valid Jaccard similarity, weights must be zero and one. If any weights are different, a warning is issued.

The Cosine similarity SC is computed using the Euclidean dot product formula. See Cosine similarity on Wikipedia. The metric is valid for any term vectors (columns of the subsumer matrix), i.e., \(M_{i,j} \in \{0, W_i\}\) is not required. Note that 1 - SC is not a proper distance metric, because it violates the triangle inequality. First convert to angle to obtain a distance metric.

tanimoto_similarity(subsumer_mat = NA, terms = NULL, ...)

jaccard_similarity(subsumer_mat = NA, terms = NULL, ...)

cosine_similarity(subsumer_mat = NA, terms = NULL, ...)

Arguments

subsumer_mat

matrix or data.frame, the vector-encoded matrix M of subsumers for which \(M_{i,j} = W_i, W_i > 0\) (W = weights), if class i subsumes term j, and 0 otherwise. A binary (\(M_{i,j} \in \{0, 1\}\)) encoding (i.e., W[i] = 1) can be obtained from subsumer_matrix().

terms

character, optionally the list of terms (as IRIs and/or labels) for which to generate a properly encoded subsumer matrix on the fly.

...

parameters to be passed on to subsumer_matrix() if a subsumer matrix is to be generated on the fly.

Value

A matrix with M[i,j] = similarity of terms i and j.

Examples

# NOT RUN {
sm <- jaccard_similarity(terms = c("pelvic fin", "pectoral fin",
                                   "forelimb", "hindlimb",
                                   "dorsal fin", "caudal fin"),
                         .colnames = "label")
sm

# e.g., turn into distance matrix, cluster, and plot
plot(hclust(as.dist(1-sm)))
# }