Semantic similarity between profiles — profile

Calculates the semantic similarity between profiles (groups) of terms.

bestPairs aggregates pairwise scores by "best pairs" between two profiles. That is, for profiles P1 and P2 with terms T1[i] (i = 1,...n) and T2[j] (j = 1,...,m), respectively, for S(P1, P2) bestPairs will determine for each i the score s(T1[i], T2[j]) that is "best" (see parameter best), followed by aggregating the resulting n best-pair scores to a single value (see parameter aggregate). The resulting scores are necessarily asymmetric, i.e., S(P1, P2) != S(P2, P1) for P1 != P2.

reduce.ignoringDiag is a convenience function for aggregating pairwise similarity scores such that the diagonal of the pair-wise similarity matrix is ignored. This may be desirable, for example, when average with-in group similarity should exclude the similarity of terms to themselves.

profile_similarity(pairwise, subsumer_mat, ..., f, reduce = NA)

bestPairs(X, best = max, aggregate = mean)

reduce.ignoringDiag(X, aggregate = mean)

Arguments

pairwise: the function for calculating (pairwise) semantic similarity between term vectors. This function must accept the subsumer matrix as its first argument. Additional named arguments (see ...) may be passed to it as well. See similarity for semantic similarity metrics available in this package.
subsumer_mat: the subsumer matrix as data.frame. See subsumer_matrix() for more detail and the usual way for obtaining it.
...: additional named arguments to be passed to the pairwise function.
f: a factor (or an object coercible to factor) defining the group (profile) membership of the terms (= columns) in the subsumer matrix. Columns and rows of the resulting profile similarity matrix will take their names from the levels of the factor. The factor may have 2 or more levels.
reduce: in pairwise mode, the function for reducing pairwise scores to a single between-group score. In each invocation, the function will be passed the submatrix of the pairwise similarity matrix that corresponds to the pairwise scores of terms from the two profiles to be compared. A function may aggregate the scores irrespective of the column and row structure of the submatrix (for example, by taking the mean), or aggregate first by columns or rows, followed by aggregating the results (for example, see bestPairs()). In the latter case, the result may not be symmetric (i.e., s(p1,p2) != s(p2,p1)).
X: array or matrix, the submatrix of pairwise scores to aggregate
best: the function for determing the best score. The default is max().
aggregate: the function for aggregating scores (for reduce.ignoringDiag) or best pairwise scores (for bestPairs). For reduce.ignoringDiag, the function must also accept parameter na.rm, and if the value is TRUE, remove NA values before calculating the aggregate. The default is mean().

Details

A profile refers to a group of terms, and profile similarity to the similarity between these groups. Profile similarity can be calculated in two principal ways, often referred to as group-wise and pair-wise. Both are supported here.

In group-wise mode, the terms and their subsumers in a group are first combined into a single vector representing their union (also sometimes called the subgraph corresponding to a group). Then the pairwise algorithm is applied to the unions.
In pairwise mode, the pairwise similarities between all terms are calculated, and then the pairwise scores between the terms in two groups are reduced to a single score giving the between-group score.

Note

In pairwise mode, if the reduce function is asymmetric (as will typically be the case for functions aggregating in two steps), the upper and lower triangle of the profile similarity matrix will not be symmetric. If a symmetric similarity matrix is desired, this can be achieved by computing (X + t(X)) / 2, if X is the profile similarity matrix.

Examples

tt <- sapply(c("pelvic fin", "pectoral fin",
               "forelimb", "hindlimb", "dorsal fin", "caudal fin"),
             get_term_iri, as = "anatomy")

# define groups (profiles) as factors:
pairedUnpaired <- c(rep("paired", times = 4), rep("unpaired", times = 2))
finsLimbs <- c("fins", "fins", "limbs", "limbs", "fins", "fins")
pairedFinLimb <- interaction(as.factor(pairedUnpaired), as.factor(finsLimbs))

# compute subsumer matrix
subs.mat <- subsumer_matrix(tt, .colnames = "label", .labels = names(tt),
                            preserveOrder = TRUE)
# group-wise profile similarity:
profile_similarity(jaccard_similarity, subs.mat, f = pairedUnpaired)
#>             paired  unpaired
#> paired   1.0000000 0.4444444
#> unpaired 0.4444444 1.0000000
profile_similarity(jaccard_similarity, subs.mat, f = finsLimbs)
#>            fins     limbs
#> fins  1.0000000 0.7301587
#> limbs 0.7301587 1.0000000
profile_similarity(jaccard_similarity, subs.mat, f = pairedFinLimb)
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     1.0000000     0.5090909    0.7796610
#> unpaired.fins   0.5090909     1.0000000    0.4576271
#> paired.limbs    0.7796610     0.4576271    1.0000000

# pairwise, using mean (average pairwise score); result is symmetric
profile_similarity(jaccard_similarity, subs.mat, f = pairedFinLimb,
                   reduce = mean)
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     0.9215686     0.5453061    0.7770979
#> unpaired.fins   0.5453061     0.9218750    0.5047134
#> paired.limbs    0.7770979     0.5047134    0.8888889
# the same, but excluding self-similarity of terms within groups
profile_similarity(jaccard_similarity, subs.mat, f = pairedFinLimb,
                   reduce = reduce.ignoringDiag)
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     0.8431373     0.5453061    0.7770979
#> unpaired.fins   0.5453061     0.8437500    0.5047134
#> paired.limbs    0.7770979     0.5047134    0.7777778
# pairwise, using max; result is symmetric
profile_similarity(jaccard_similarity, subs.mat, f = pairedFinLimb,
                   reduce = max)
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     1.0000000     0.5600000    0.8269231
#> unpaired.fins   0.5600000     1.0000000    0.5192308
#> paired.limbs    0.8269231     0.5192308    1.0000000
# pairwise, using average of best pairs; result is _not_ symmetric
sm <- profile_similarity(jaccard_similarity, subs.mat, f = pairedFinLimb,
                         reduce = bestPairs)
sm
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     1.0000000     0.5600000    0.8269231
#> unpaired.fins   0.5453061     1.0000000    0.5047134
#> paired.limbs    0.8269231     0.5192308    1.0000000
# make symmtric, for example by averaging:
(sm + t(sm)) / 2
#>               paired.fins unpaired.fins paired.limbs
#> paired.fins     1.0000000     0.5526531    0.8269231
#> unpaired.fins   0.5526531     1.0000000    0.5119721
#> paired.limbs    0.8269231     0.5119721    1.0000000