R/clustering.R
labelTransfer.Rd
When two sets of data share an embedding space, transfer the labels from one of the sets to the other based on KNN similarity voting in that space.
# S4 method for class 'giotto,giotto'
labelTransfer(
x,
y,
labels,
k = 10,
name = paste0("trnsfr_", labels),
integration_method = c("none", "harmony"),
prob = TRUE,
reduction = "cells",
reduction_method = "pca",
reduction_name = "pca",
dimensions_to_use = 1:10,
spat_unit = NULL,
feat_type = NULL,
return_gobject = TRUE,
...
)
# S4 method for class 'giotto,missing'
labelTransfer(
x,
spat_unit = NULL,
feat_type = NULL,
source_cell_ids,
target_cell_ids,
labels,
k = 10,
name = paste0("trnsfr_", labels),
prob = TRUE,
reduction = "cells",
reduction_method = "pca",
reduction_name = "pca",
dimensions_to_use = 1:10,
return_gobject = TRUE,
...
)
target object
source object
metadata column in source with labels to transfer
number of k-neighbors to train a KNN classifier
metadata column in target to apply the full set of labels to
character. Integration method to use when transferring labels. Options are "none" (default) and "harmony". See section below for more info and params.
output knn probabilities together with label predictions
reduction on cells or features (default = "cells")
shared reduction method (default = "pca" space)
name of shared reduction space (default name = "pca")
dimensions to use in shared reduction space (default = 1:10)
spatial unit. A character vector of 2 can also be passed
for x (1) and y (2). Setting defaults with activeSpatUnit()
may be easier
feature type. A character vector of 2 can also be passed
for x (1) and y (2). Setting defaults with activeFeatType()
may be easier
Arguments passed on to FNN::knn
algorithm
nearest neighbor search algorithm.
cell/spatial IDs with the source labels to transfer
cell/spatial IDs to transfer the labels to.
IDs from source_cell_ids
are always included as well.
object x
with new transferred labels added to metadata. If
running on x
and y
objects, integration_method = "harmony"
,
plot_join_labels = TRUE
, and return_plot = TRUE
is set, output will
be instead a named list of gobject
(updated x
), and label_source_plot
and label_target_plot
ggplot2
objects
This function trains a KNN classifier with FNN::knn()
.
The training data is from object y
or source_cell_ids
subset in x
and
uses existing annotations within the cell metadata.
Cells without annotation/labels from x
or target_cell_ids
subset in x
will receive predicted labels (and optional probabilities when
prob = TRUE
).
IMPORTANT This projection assumes that you're using the same dimension reduction space (e.g. PCA) and number of dimensions (e.g. first 10 PCs) to train the KNN classifier as you used to create the initial annotations/labels in the source Giotto object.
This function can allow you to work with very big data as you can predict cell labels on a smaller & subsetted Giotto object and then project the cell labels to the remaining cells in the target Giotto object. It can also be used to transfer labels from one set of annotated data to another dataset based on expression similarity after joining and integrating.
When running labelTranfer()
on two giotto
objects, an integration
pipeline can also be run to align the two datasets together before the
transfer. integration_method = "harmony"
will make a temporary joined
object on shared features, filter to remove 0 values, run PCA, then harmony
integration, before performing the label transfer from y
to x
on the
integrated harmony embedding space. Additional params that can be used with
this method are:
source_cell_ids
- character. subset of y
cells to use
target_cell_ids
- character. subset of x
cells to use
expression_values
- character. expression values in x
and y
to use
to generate combined space. Default = "raw"
use_hvf
- logical. whether to calculate highly variable features to use
for PCA calculation. Default = TRUE
, but setting FALSE
is recommended if
any of x
or y
has roughly 1000 features or fewer
plot_join_labels
- logical. Whether to plot source labels and final
labels in the joine object UMAP.
normalize_params
- named list. Additional params to pass to
normalizeGiotto()
if desired.
pca_params
- named list. Additional params to pass to runPCA()
if
desired.
integration_params
- named list. Additional params to pass to
runGiottoHarmony()
if desired.
plot_params
- named list. Additional params to pass to plotUMAP()
if
desired. Only relevant when plot_join_labels = TRUE
verbose
- verbosity
g <- GiottoData::loadGiottoMini("visium")
#> 1. read Giotto object
#> 2. read Giotto feature information
#> 3. read Giotto spatial information
#> 3.1 read Giotto spatial shape information
#> 3.2 read Giotto spatial centroid information
#> 3.3 read Giotto spatial overlap information
#> 4. read Giotto image information
#> python already initialized in this session
#> active environment : '/usr/bin/python3'
#> python version : 3.10
#> checking default envname 'giotto_env'
#> a system default python environment was found
#> Using python path:
#> "/usr/bin/python3"
id_subset <- sample(spatIDs(g), 300)
n_pred <- nrow(pDataDT(g)) - 300
# transfer labels from one object to another ###################
g_small <- g[, id_subset]
# additional steps to get labels to transfer on smaller object...
g <- labelTransfer(g, g_small, labels = "leiden_clus")
# transfer labels between subsets of a single object ###########
g <- labelTransfer(g,
label = "leiden_clus", source_cell_ids = id_subset, name = "knn_leiden2"
)