A subsumer matrix M for terms \(j \in \{1, \dots, n\}\) has value \(M_{i,j}=1\) iff class i (which can be an anonymous class expression) subsumes term j, and zero otherwise. Therefore, it will have n columns, one for each term.

subsumer_matrix(terms, .colnames = c("ID", "IRI", "label"),
  .labels = NULL, preserveOrder = FALSE, verbose = FALSE)

Arguments

terms

character, the list of terms for which to compute the dependency matrix. Can be given as term IRIs or term labels, and the list can contain both. Terms given as labels will first be resolved to IRIs, assuming they are from an anatomy ontology.

.colnames

character, how to name the columns of the resulting matrix.

  • "ID" (the default): use the term IDs (the last component of the term IRIs).

  • "IRI": use the term IRIs.

  • "label": use the terms' labels (see .labels parameter).

.labels

character, the labels for terms where known. Only used if .colnames = "label". If NULL (the default), labels will be looked up if terms are provided as IRIs; elements of the terms list that are not in IRI form are assumed to be the label. If a list, must have the same length and ordering as terms; any NA elements will be looked up (from the corresponding term IRI).

preserveOrder

logical, whether to return columns in the same order as terms. The default is not to preserve the order.

verbose

logical, whether to print informative messages about certain potentially time-consuming operations.

Value

A data.frame representing the subsumer matrix

The matrix will have additional attributes depending on the choice of how to name rows and columns. If .colnames = "ID" (the default), the matrix will have an attribute prefixes giving the URL prefixes removed from the term IRIs to yield the IDs, in the order of the rows. If .colnames = "label", it will have attribute term.iris, giving the term IRIs for the rows (and columns). Note that these extra attributes will be lost upon subsetting the returned matrix.

Details

In this implementation, for each row i \(\sum_{j=1}^{n}M_{i,j} > 0\). That is, each row will have at least one non-zero value, which means that the number of classes not subsuming a term will be highly incomplete, because the (usually very many) classes not subsuming any of the terms will not be included. This subsumer matrix is thus only useful for similarity metrics for which non-subsuming terms can be ignored.

Examples

# NOT RUN {
tl <- c("http://purl.obolibrary.org/obo/UBERON_0000981",
        "http://purl.obolibrary.org/obo/UBERON_0002103",
        "http://purl.obolibrary.org/obo/UBERON_0000976",
        "http://purl.obolibrary.org/obo/UBERON_0002102")
m <- subsumer_matrix(tl)
m <- # term IDs as column names
id_prefixes <- attr(m, "prefixes")
id_prefixes # 4x "http://purl.obolibrary.org/obo/"

m <- subsumer_matrix(tl, .colnames = "label")
m # term labels as column names
mat_terms <- attr(m, "term.iris")
mat_terms # term IRIs in the same order as columns
# }