11 Spatial proteomics: Multiplexed Immunofluorescence
Junxiang Xu
August 6th 2024
Before you start, this tutorial contains an optional part to run image segmentation using Giotto wrapper of Cellpose. If considering to use that function, please restart R session as we will need to activate a new Giotto python environment. The environment is also compatible with other Giotto functions. We will also need to install the Cellpose supported Giotto environment if haven’t done so.
#Install the Giotto Environment with Cellpose, note that we only need to do it once
reticulate::conda_create(envname = "giotto_cellpose",
python_version = 3.8)
#.re.restartR()
reticulate::use_condaenv("giotto_cellpose")
reticulate::py_install(
pip = TRUE,
envname = "giotto_cellpose",
packages = c(
"pandas",
"networkx",
"python-igraph",
"leidenalg",
"scikit-learn",
"cellpose",
"smfishhmrf",
"tifffile",
"scikit-image"
)
)
#.rs.restartR()
Now, activate the Giotto python environment.
#.rs.restartR()
# Activate the Giotto python environment of your choice
GiottoClass::set_giotto_python_path("giotto_cellpose")
# Check if cellpose was successfully installed
GiottoUtils::package_check("cellpose", repository = "pip")
11.1 Spatial Proteomics Technologies
This tutorial is aimed at analyzing spatially resolved multiplexed immunofluorescence data. It is compatible for different kinds of image based spatial proteomics data, such as Akoya(CODEX), CyCIF, IMC, MIBI, and Lunaphore(seqIF). Note that this tutorial will focus on starting directly with the intensity data(image), not the decoded count matrix.
This is the example Lunaphore dataset from Lunaphore the official website and we are using the cropped one small area as an example. This is an overview of a subset of how the data would look like.
11.2 Raw data type coming out of different technologies
11.2.1 Use ome.tiff as an example output data to begin with
OME-TIFF (Open Microscopy Environment Tagged Image File Format) is a file format designed for including detailed metadata and support for multi-dimensional image data. This is a common output file format for spatial proteomics platform such as lunaphore.
library(Giotto)
instrs <- createGiottoInstructions(save_dir = file.path(getwd(),"/img/02_session4/"),
save_plot = TRUE,
show_plot = TRUE,
python_path = "giotto_cellpose")
options(timeout = Inf)
data_dir <- "data/02_session4"
destfile <- file.path(data_dir, "Lunaphore.zip")
if (!dir.exists(data_dir)) { dir.create(data_dir, recursive = TRUE) }
download.file("https://zenodo.org/records/13175721/files/Lunaphore.zip?download=1",
destfile = destfile)
unzip(file.path(data_dir, "/Lunaphore.zip"), exdir = data_dir)
list.files(file.path(data_dir, "/Lunaphore"))
We provide a way to extract meta data information directly from ome.tiffs. Please note that different platforms may store the meta data such as channel information in a different format, we will probably need to change the node names of the ome-XML.
img_path <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff")
img_meta <- ometif_metadata(img_path,
node = "Channel",
output = "data.frame")
img_meta
However, sometimes a simple ometiff file manipulation like cropping could result in a loss of ome-XML information from the ome.tiff file. That way, we can use a different strategy to parse the xml information seperately and get channel information from it.
## Get channel information
Luna <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff")
xmldata <- xml2::read_xml(file.path(data_dir,"/Lunaphore/Lunaphore_sample_metadata.xml"))
node <- xml2::xml_find_all(xmldata, "//d1:Channel", ns = xml2::xml_ns(xmldata))
channel_df <- as.data.frame(Reduce("rbind", xml2::xml_attrs(node)))
channel_df
11.2.2 Use single channel images as an example output data to begin with
Some platforms may also deconvolute and output gray scale single channel images. And we can create single channel images from ome.tiffs, the single channel images will be of the same format if the platform provide single channel gray scale images. With the single channel images, we can create a GiottoLargeImage and see what it looks like.
# Create multichannel raster and extract each single channels
Luna_terra <- terra::rast(Luna)
names(Luna_terra) <- channel_df$Name
gimg_DAPI <- createGiottoLargeImage(Luna_terra[[1]],
negative_y = FALSE,
flip_vertical = TRUE)
plot(gimg_DAPI)
Extract and save the raster image for future use.
single_channel_dir <- file.path(data_dir, "/Lunaphore/single_channels/")
if (!dir.exists(single_channel_dir)) { dir.create(single_channel_dir, recursive = TRUE) }
for (i in 1:nrow(channel_df)){
single_channel <- terra::subset(Luna_terra, i)
terra::writeRaster(single_channel,
filename = paste0(single_channel_dir,names(single_channel),".tiff"),
overwrite = TRUE)
}
Create a list of GiottoLargeImages using single channel rasters.
file_names <- list.files(single_channel_dir,
full.names = TRUE)
image_names <- sub("\\.tiff$", "", list.files(single_channel_dir))
gimg_list <- createGiottoLargeImageList(raster_objects = file_names,
names = image_names,
negative_y = FALSE,
flip_vertical = TRUE)
names(gimg_list) <- image_names
plot(gimg_list[["Vimentin"]])
11.3 Cell Segmentation file to get single cell level protein expression
Cell segmentation is necessary to generate single cell level protein expression. Currently, there are multiple algorithms to generate segmentations from images and output could be different. For that purpose, Giotto provides createGiottoPolygonsFromMask(), createGiottoPolygonsFromDfr(), createGiottoPolygonsFromGeoJSON() to load different type of file to the giottoPolygon Class.
11.3.1 Using segmentation output file from DeepCell(mesmer) as an example.
We collapsed several different channels to created a pseudo memberane staining channel(“nuc_and_bound.tif” provided here), and use that as an input for the deepcell mesmer segmentation pipeline. We can load the output mask from to GiottoPolygon via a convenience function.
gpoly_mesmer <- createGiottoPolygonsFromMask(
file.path(data_dir, "/Lunaphore/whole_cell_mask.tif"),
shift_horizontal_step = FALSE,
shift_vertical_step = FALSE,
flip_vertical = TRUE,
calc_centroids = TRUE)
plot(gpoly_mesmer)
We can also zoom in to check how does the segmentation look.
11.3.2 Using Giotto wrapper of Cellpose to perform segmentation
Here, we create a mini example by cropping the image to a smaller area. Note that crop() is probably easier to use to directly crop image, unless cropping the image when the image is inside of a giotto object.
gimg_cropped <- cropGiottoLargeImage(giottoLargeImage = gimg_DAPI,
crop_extent = terra::ext(zoom))
writeGiottoLargeImage(gimg_cropped,
filename = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
overwrite = TRUE)
#Create a giotto image to evaluate segmentation
gimg_for_cellpose <- createGiottoLargeImage(
file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
negative_y = FALSE)
Now we can run the cellpose segmentation. We can provide different parameters for cellpose inference model(flow_threshold,cellprob_threshold,etc), and practically, the batch size represents how many 224X224 images are calculated in parallel, increasing the amount will increase RAM/VRAM requirement, lowering the amount will increase the run time. For more information please refer to the cellpose website
doCellposeSegmentation(image_dir = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
mask_output = file.path(data_dir, "/Lunaphore/giotto_cellpose_seg.tiff"),
channel_1 = 0,
channel_2 = 0,
model_name = "cyto3",
batch_size = 12)
cpoly <- createGiottoPolygonsFromMask(file.path(data_dir,"/Lunaphore/giotto_cellpose_seg.tiff"),
shift_horizontal_step = FALSE,
shift_vertical_step = FALSE,
flip_vertical = TRUE)
plot(gimg_for_cellpose)
plot(cpoly, add = TRUE, border = "red")
11.4 Create a Giotto Object using list of gitto large images and polygons
You will need to have:
- list of giotto images
- giottoPolygon created from segmentation
Lunaphore_giotto <- createGiottoObjectSubcellular(gpolygons = list("cell" = gpoly_mesmer),
images = gimg_list,
instructions = instrs)
Lunaphore_giotto
11.4.1 Overlap to matrix
calculateOverlap() and overlapToMatrix() are used to overlap the intensity values with
11.4.2 Manipulate Expression information
For IF data, DAPI staining is usually only used for stain nuclei, the intensity value of DAPI usually does not have meaningful result to drive difference between cell types. Similar things could happen when a platform uses some reference channel to adjust signal calling, such as TRITC or Cy5, These images will be loaded but need to be removed for expression profile.
Therefore, we could extract the feature expression matrix, filter the DAPI information and write it back to the Giotto Object
expr_mtx <- getExpression(Lunaphore_giotto,
values = "raw",
output = "matrix")
filtered_expr_mtx <- expr_mtx[rownames(expr_mtx) != "DAPI",]
Lunaphore_giotto <- setExpression(Lunaphore_giotto,
feat_type = "protein",
x = createExprObj(filtered_expr_mtx),
name = "raw")
showGiottoExpression(Lunaphore_giotto)
11.4.3 Rescale polygons
rescalePolygons() will provide a quick way to manipulate the polygon size and potentially affect the expression for each cell. redo the calculateOverlap() and overlapToMatrix() will potentially change the downstream analysis
Lunaphore_giotto <- rescalePolygons(gobject = Lunaphore_giotto,
poly_info = "cell",
name = "smallcell",
fx = 0.7, fy = 0.7,
calculate_centroids = TRUE)
smallpoly <- getPolygonInfo(Lunaphore_giotto,
polygon_name = "smallcell")
plot(gimg_DAPI, ext = zoom)
plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom)
plot(smallpoly, add = TRUE, border = "red", ext = zoom)
11.4.4 Perform clustering and differential expression
The Giotto Object can then go through standard analysis pipeline normalization, dimensional reduction and clustering
Lunaphore_giotto <- normalizeGiotto(Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein")
Lunaphore_giotto <- addStatistics(Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein")
Lunaphore_giotto <- runPCA(Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein",
scale_unit = FALSE,
center = FALSE,
ncp = 20,
feats_to_use = NULL,
set_seed = TRUE)
screePlot(Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein",
show_plot = TRUE)
Due to the limited number of total features we have, Leiden clustering generally does not work very well compared to Kmeans or hierarchical clustering. Here we can use hierarchical clustering to do a quick check.
Lunaphore_giotto <- runUMAP(gobject = Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein",
dimensions_to_use = 1:5,
set_seed = TRUE)
Lunaphore_giotto <- createNearestNetwork(Lunaphore_giotto,
spat_unit = "cell",
feat_type = "protein",
dimensions_to_use = 1:5)
Lunaphore_giotto <- doHclust(Lunaphore_giotto,
k = 8,
dim_reduction_to_use = "cells",
spat_unit = "cell",
feat_type = "protein")
spatInSituPlotPoints(gobject = Lunaphore_giotto,
spat_unit = "cell",
polygon_feat_type = "cell",
show_polygon = TRUE,
feat_type = "protein",
feats = NULL,
polygon_fill = "hclust",
polygon_fill_as_factor = TRUE,
polygon_line_size = 0,
image_name = "CD68",
show_image = TRUE,
return_plot = TRUE,
polygon_color = "black",
background_color = "white")
Then we can check the heatmap of protein expression and determine the first round of cluster annotation.
11.4.6 Spatial network
This is to create a cellular neighborhood based on nearest neighbor of physical distance.
11.4.7 Cell Neighborhood: Cell-Type/Cell-Type Interactions
This is using cellProximityEnrichment() to statistically identify cell type interactions.
cell_proximities <- cellProximityEnrichment(gobject = Lunaphore_giotto,
cluster_column = "cell_types",
spatial_network_name = "Delaunay_network",
adjust_method = "fdr",
number_of_simulations = 2000)
## barplot
cellProximityBarplot(gobject = Lunaphore_giotto,
CPscore = cell_proximities,
min_orig_ints = 5,
min_sim_ints = 5)
11.5 Session info
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Giotto_4.1.0 GiottoClass_0.3.4
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8
[4] magrittr_2.0.3 magick_2.8.4 farver_2.1.2
[7] rmarkdown_2.27 zlibbioc_1.50.0 ragg_1.3.2
[10] vctrs_0.6.5 memoise_2.0.1 GiottoUtils_0.1.10
[13] terra_1.7-78 htmltools_0.5.8.1 S4Arrays_1.4.1
[16] raster_3.6-26 SparseArray_1.4.8 sass_0.4.9
[19] bslib_0.8.0 KernSmooth_2.23-24 htmlwidgets_1.6.4
[22] plyr_1.8.9 plotly_4.10.4 cachem_1.1.0
[25] igraph_2.0.3 lifecycle_1.0.4 pkgconfig_2.0.3
[28] rsvd_1.0.5 Matrix_1.7-0 R6_2.5.1
[31] fastmap_1.2.0 GenomeInfoDbData_1.2.12 MatrixGenerics_1.16.0
[34] digest_0.6.36 colorspace_2.1-1 S4Vectors_0.42.1
[37] irlba_2.3.5.1 textshaping_0.4.0 GenomicRanges_1.56.1
[40] beachmat_2.20.0 labeling_0.4.3 fansi_1.0.6
[43] httr_1.4.7 polyclip_1.10-7 abind_1.4-5
[46] compiler_4.4.1 proxy_0.4-27 withr_3.0.1
[49] backports_1.5.0 BiocParallel_1.38.0 viridis_0.6.5
[52] DBI_1.2.3 highr_0.11 ggforce_0.4.2
[55] MASS_7.3-61 DelayedArray_0.30.1 rjson_0.2.21
[58] classInt_0.4-10 gtools_3.9.5 GiottoVisuals_0.2.4
[61] tools_4.4.1 units_0.8-5 glue_1.7.0
[64] dbscan_1.2-0 grid_4.4.1 sf_1.0-16
[67] checkmate_2.3.2 reshape2_1.4.4 generics_0.1.3
[70] gtable_0.3.5 class_7.3-22 tidyr_1.3.1
[73] data.table_1.15.4 BiocSingular_1.20.0 tidygraph_1.3.1
[76] ScaledMatrix_1.12.0 sp_2.1-4 xml2_1.3.6
[79] utf8_1.2.4 XVector_0.44.0 BiocGenerics_0.50.0
[82] RcppAnnoy_0.0.22 ggrepel_0.9.5 pillar_1.9.0
[85] stringr_1.5.1 dplyr_1.1.4 tweenr_2.0.3
[88] lattice_0.22-6 deldir_2.0-4 tidyselect_1.2.1
[91] SingleCellExperiment_1.26.0 knitr_1.48 gridExtra_2.3
[94] bookdown_0.40 IRanges_2.38.1 SummarizedExperiment_1.34.0
[97] scattermore_1.2 stats4_4.4.1 xfun_0.46
[100] graphlayouts_1.1.1 Biobase_2.64.0 matrixStats_1.3.0
[103] stringi_1.8.4 UCSC.utils_1.0.0 lazyeval_0.2.2
[106] yaml_2.3.10 evaluate_0.24.0 codetools_0.2-20
[109] ggraph_2.2.1 tibble_3.2.1 BiocManager_1.30.23
[112] colorRamp2_0.1.0 cli_3.6.3 uwot_0.2.2
[115] reticulate_1.38.0 systemfonts_1.1.0 jquerylib_0.1.4
[118] munsell_0.5.1 Rcpp_1.0.13 GenomeInfoDb_1.40.1
[121] png_0.1-8 parallel_4.4.1 ggplot2_3.5.1
[124] exactextractr_0.10.0 SpatialExperiment_1.14.0 viridisLite_0.4.2
[127] scales_1.3.0 e1071_1.7-14 purrr_1.0.2
[130] crayon_1.5.3 rlang_1.1.4 cowplot_1.1.3