11 Spatial proteomics: Multiplexed Immunofluorescence

Junxiang Xu

August 6th 2024

Before you start, this tutorial contains an optional part to run image segmentation using Giotto wrapper of Cellpose. If considering to use that function, please restart R session as we will need to activate a new Giotto python environment. The environment is also compatible with other Giotto functions. We will also need to install the Cellpose supported Giotto environment if haven’t done so.

#Install the Giotto Environment with Cellpose, note that we only need to do it once
reticulate::conda_create(envname = "giotto_cellpose",
                         python_version = 3.8)

#.re.restartR()
reticulate::use_condaenv("giotto_cellpose")

reticulate::py_install(
  pip = TRUE,
  envname = "giotto_cellpose",
  packages = c(
    "pandas",
    "networkx",
    "python-igraph",
    "leidenalg",
    "scikit-learn",
    "cellpose",
    "smfishhmrf",
    "tifffile",
    "scikit-image"
  )
)
#.rs.restartR()

Now, activate the Giotto python environment.

#.rs.restartR()
# Activate the Giotto python environment of your choice
GiottoClass::set_giotto_python_path("giotto_cellpose")

# Check if cellpose was successfully installed
GiottoUtils::package_check("cellpose", repository = "pip")

11.1 Spatial Proteomics Technologies

This tutorial is aimed at analyzing spatially resolved multiplexed immunofluorescence data. It is compatible for different kinds of image based spatial proteomics data, such as Akoya(CODEX), CyCIF, IMC, MIBI, and Lunaphore(seqIF). Note that this tutorial will focus on starting directly with the intensity data(image), not the decoded count matrix.

This is the example Lunaphore dataset from Lunaphore the official website and we are using the cropped one small area as an example. This is an overview of a subset of how the data would look like.

11.2 Raw data type coming out of different technologies

11.2.1 Use ome.tiff as an example output data to begin with

OME-TIFF (Open Microscopy Environment Tagged Image File Format) is a file format designed for including detailed metadata and support for multi-dimensional image data. This is a common output file format for spatial proteomics platform such as lunaphore.

library(Giotto)

instrs <- createGiottoInstructions(save_dir = file.path(getwd(),"/img/02_session4/"),
                                   save_plot = TRUE,
                                   show_plot = TRUE,
                                   python_path = "giotto_cellpose")
options(timeout = Inf)

data_dir <- "data/02_session4"

destfile <- file.path(data_dir, "Lunaphore.zip")

if (!dir.exists(data_dir)) { dir.create(data_dir, recursive = TRUE) }

download.file("https://zenodo.org/records/13175721/files/Lunaphore.zip?download=1", 
              destfile = destfile)

unzip(file.path(data_dir, "/Lunaphore.zip"), exdir = data_dir)

list.files(file.path(data_dir, "/Lunaphore"))

We provide a way to extract meta data information directly from ome.tiffs. Please note that different platforms may store the meta data such as channel information in a different format, we will probably need to change the node names of the ome-XML.

img_path <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff")

img_meta <- ometif_metadata(img_path, 
                            node = "Channel", 
                            output = "data.frame")

img_meta

However, sometimes a simple ometiff file manipulation like cropping could result in a loss of ome-XML information from the ome.tiff file. That way, we can use a different strategy to parse the xml information seperately and get channel information from it.

## Get channel information
Luna <- file.path(data_dir, "/Lunaphore/Lunaphore_example.ome.tiff")

xmldata <- xml2::read_xml(file.path(data_dir,"/Lunaphore/Lunaphore_sample_metadata.xml"))

node <- xml2::xml_find_all(xmldata, "//d1:Channel", ns = xml2::xml_ns(xmldata))

channel_df <- as.data.frame(Reduce("rbind", xml2::xml_attrs(node)))

channel_df

11.2.2 Use single channel images as an example output data to begin with

Some platforms may also deconvolute and output gray scale single channel images. And we can create single channel images from ome.tiffs, the single channel images will be of the same format if the platform provide single channel gray scale images. With the single channel images, we can create a GiottoLargeImage and see what it looks like.

# Create multichannel raster and extract each single channels
Luna_terra <- terra::rast(Luna)

names(Luna_terra) <- channel_df$Name

gimg_DAPI <- createGiottoLargeImage(Luna_terra[[1]],
                                    negative_y = FALSE,
                                    flip_vertical = TRUE)

plot(gimg_DAPI)

Extract and save the raster image for future use.

single_channel_dir <- file.path(data_dir, "/Lunaphore/single_channels/")

if (!dir.exists(single_channel_dir)) { dir.create(single_channel_dir, recursive = TRUE) }

for (i in 1:nrow(channel_df)){
  single_channel <- terra::subset(Luna_terra, i)
  terra::writeRaster(single_channel,
                     filename = paste0(single_channel_dir,names(single_channel),".tiff"),
                     overwrite = TRUE)
}

Create a list of GiottoLargeImages using single channel rasters.

file_names <- list.files(single_channel_dir,
                         full.names = TRUE)

image_names <- sub("\\.tiff$", "", list.files(single_channel_dir))

gimg_list <- createGiottoLargeImageList(raster_objects = file_names,
                                        names = image_names,
                                        negative_y = FALSE,
                                        flip_vertical = TRUE)
names(gimg_list) <- image_names

plot(gimg_list[["Vimentin"]])

11.3 Cell Segmentation file to get single cell level protein expression

Cell segmentation is necessary to generate single cell level protein expression. Currently, there are multiple algorithms to generate segmentations from images and output could be different. For that purpose, Giotto provides createGiottoPolygonsFromMask(), createGiottoPolygonsFromDfr(), createGiottoPolygonsFromGeoJSON() to load different type of file to the giottoPolygon Class.

11.3.1 Using segmentation output file from DeepCell(mesmer) as an example.

We collapsed several different channels to created a pseudo memberane staining channel(“nuc_and_bound.tif” provided here), and use that as an input for the deepcell mesmer segmentation pipeline. We can load the output mask from to GiottoPolygon via a convenience function.

gpoly_mesmer <- createGiottoPolygonsFromMask(
  file.path(data_dir, "/Lunaphore/whole_cell_mask.tif"),
  shift_horizontal_step = FALSE,
  shift_vertical_step = FALSE,
  flip_vertical = TRUE,
  calc_centroids = TRUE)

plot(gpoly_mesmer)

We can also zoom in to check how does the segmentation look.

zoom <- c(2000,2500,2000,2500)

plot(gimg_DAPI, ext = zoom)

plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom)

11.3.2 Using Giotto wrapper of Cellpose to perform segmentation

Here, we create a mini example by cropping the image to a smaller area. Note that crop() is probably easier to use to directly crop image, unless cropping the image when the image is inside of a giotto object.

gimg_cropped <- cropGiottoLargeImage(giottoLargeImage = gimg_DAPI, 
                                     crop_extent = terra::ext(zoom))

writeGiottoLargeImage(gimg_cropped, 
                      filename = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
                      overwrite =  TRUE)

#Create a giotto image to evaluate segmentation
gimg_for_cellpose <- createGiottoLargeImage(
  file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
  negative_y = FALSE)

Now we can run the cellpose segmentation. We can provide different parameters for cellpose inference model(flow_threshold,cellprob_threshold,etc), and practically, the batch size represents how many 224X224 images are calculated in parallel, increasing the amount will increase RAM/VRAM requirement, lowering the amount will increase the run time. For more information please refer to the cellpose website

doCellposeSegmentation(image_dir = file.path(data_dir, "/Lunaphore/DAPI_forcellpose.tiff"),
                       mask_output = file.path(data_dir, "/Lunaphore/giotto_cellpose_seg.tiff"),
                       channel_1 = 0,
                       channel_2 = 0,
                       model_name = "cyto3",
                       batch_size = 12)

cpoly <- createGiottoPolygonsFromMask(file.path(data_dir,"/Lunaphore/giotto_cellpose_seg.tiff"),
                                      shift_horizontal_step = FALSE,
                                      shift_vertical_step = FALSE,
                                      flip_vertical = TRUE)

plot(gimg_for_cellpose)

plot(cpoly, add = TRUE, border = "red")

11.4 Create a Giotto Object using list of gitto large images and polygons

You will need to have:

  • list of giotto images
  • giottoPolygon created from segmentation
Lunaphore_giotto <- createGiottoObjectSubcellular(gpolygons = list("cell" = gpoly_mesmer),
                                                  images = gimg_list, 
                                                  instructions = instrs)
Lunaphore_giotto

11.4.1 Overlap to matrix

calculateOverlap() and overlapToMatrix() are used to overlap the intensity values with

Lunaphore_giotto <- calculateOverlap(Lunaphore_giotto,
                                     spatial_info = "cell",
                                     image_names = names(gimg_list))

Lunaphore_giotto <- overlapToMatrix(x = Lunaphore_giotto,
                                    type ="intensity",
                                    poly_info = "cell", 
                                    feat_info = "protein",
                                    aggr_function = "sum")

showGiottoExpression(Lunaphore_giotto)

11.4.2 Manipulate Expression information

For IF data, DAPI staining is usually only used for stain nuclei, the intensity value of DAPI usually does not have meaningful result to drive difference between cell types. Similar things could happen when a platform uses some reference channel to adjust signal calling, such as TRITC or Cy5, These images will be loaded but need to be removed for expression profile.

Therefore, we could extract the feature expression matrix, filter the DAPI information and write it back to the Giotto Object

expr_mtx <- getExpression(Lunaphore_giotto, 
                          values = "raw", 
                          output = "matrix")

filtered_expr_mtx <- expr_mtx[rownames(expr_mtx) != "DAPI",]

Lunaphore_giotto <- setExpression(Lunaphore_giotto,
                                  feat_type = "protein",
                                  x = createExprObj(filtered_expr_mtx),
                                  name = "raw")

showGiottoExpression(Lunaphore_giotto)

11.4.3 Rescale polygons

rescalePolygons() will provide a quick way to manipulate the polygon size and potentially affect the expression for each cell. redo the calculateOverlap() and overlapToMatrix() will potentially change the downstream analysis

Lunaphore_giotto <- rescalePolygons(gobject = Lunaphore_giotto,
                                    poly_info = "cell",
                                    name = "smallcell",
                                    fx = 0.7, fy = 0.7, 
                                    calculate_centroids = TRUE)

smallpoly <- getPolygonInfo(Lunaphore_giotto,
                            polygon_name = "smallcell")

plot(gimg_DAPI, ext = zoom)

plot(gpoly_mesmer, add = TRUE, border = "white", ext = zoom)

plot(smallpoly, add = TRUE, border = "red", ext = zoom)

11.4.4 Perform clustering and differential expression

The Giotto Object can then go through standard analysis pipeline normalization, dimensional reduction and clustering

Lunaphore_giotto <- normalizeGiotto(Lunaphore_giotto,
                                    spat_unit = "cell", 
                                    feat_type = "protein")

Lunaphore_giotto <- addStatistics(Lunaphore_giotto, 
                                  spat_unit = "cell", 
                                  feat_type = "protein")

Lunaphore_giotto <- runPCA(Lunaphore_giotto, 
                           spat_unit = "cell", 
                           feat_type = "protein",
                           scale_unit = FALSE, 
                           center = FALSE, 
                           ncp = 20,
                           feats_to_use = NULL,
                           set_seed = TRUE)

screePlot(Lunaphore_giotto,
          spat_unit = "cell", 
          feat_type = "protein",
          show_plot = TRUE)

Due to the limited number of total features we have, Leiden clustering generally does not work very well compared to Kmeans or hierarchical clustering. Here we can use hierarchical clustering to do a quick check.

Lunaphore_giotto <- runUMAP(gobject = Lunaphore_giotto, 
                            spat_unit = "cell",
                            feat_type = "protein",
                            dimensions_to_use = 1:5,
                            set_seed = TRUE)

Lunaphore_giotto <- createNearestNetwork(Lunaphore_giotto, 
                                         spat_unit = "cell",
                                         feat_type = "protein", 
                                         dimensions_to_use = 1:5)

Lunaphore_giotto <- doHclust(Lunaphore_giotto, 
                             k = 8,
                             dim_reduction_to_use = "cells",
                             spat_unit = "cell",
                             feat_type = "protein")

spatInSituPlotPoints(gobject = Lunaphore_giotto, 
                     spat_unit = "cell",
                     polygon_feat_type = "cell",
                     show_polygon = TRUE,
                     feat_type = "protein",
                     feats = NULL,
                     polygon_fill = "hclust",
                     polygon_fill_as_factor = TRUE,
                     polygon_line_size = 0,
                     image_name = "CD68",
                     show_image = TRUE,
                     return_plot = TRUE,
                     polygon_color = "black",
                     background_color = "white")

Then we can check the heatmap of protein expression and determine the first round of cluster annotation.

cluster_column <- "hclust"

plotMetaDataHeatmap(Lunaphore_giotto,
                    spat_unit = "cell",
                    feat_type = "protein",
                    expression_values = "raw",
                    metadata_cols = cluster_column, 
                    selected_feats = names(gimg_list),
                    y_text_size = 8,
                    show_values = "zscores_rescaled")

11.4.5 Give the cluster an annotation based on expression values

annotation <- c("B_cell", "Macrophage", "T_cell", "stromal", 
                "epithelial", "DC" , "Fibroblast", "endothelial")

names(annotation) <- 1:8

Lunaphore_giotto <- annotateGiotto(Lunaphore_giotto,
                                   cluster_column = "hclust",
                                   annotation_vector = annotation,
                                   name = "cell_types")

11.4.6 Spatial network

This is to create a cellular neighborhood based on nearest neighbor of physical distance.

Lunaphore_giotto <- createSpatialNetwork(Lunaphore_giotto)

spatPlot2D(Lunaphore_giotto, 
           show_network = TRUE,
           network_color = "blue",
           point_size = 1.5, 
           cell_color = "hclust")

11.4.7 Cell Neighborhood: Cell-Type/Cell-Type Interactions

This is using cellProximityEnrichment() to statistically identify cell type interactions.

cell_proximities <- cellProximityEnrichment(gobject = Lunaphore_giotto,
                                            cluster_column = "cell_types",
                                            spatial_network_name = "Delaunay_network",
                                            adjust_method = "fdr",
                                            number_of_simulations = 2000)
## barplot
cellProximityBarplot(gobject = Lunaphore_giotto,
                     CPscore = cell_proximities, 
                     min_orig_ints = 5, 
                     min_sim_ints = 5)

## network
cellProximityNetwork(gobject = Lunaphore_giotto, 
                     CPscore = cell_proximities, 
                     remove_self_edges = TRUE,
                     only_show_enrichment_edges = FALSE)

11.5 Session info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Giotto_4.1.0      GiottoClass_0.3.4

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3          rstudioapi_0.16.0           jsonlite_1.8.8             
  [4] magrittr_2.0.3              magick_2.8.4                farver_2.1.2               
  [7] rmarkdown_2.27              zlibbioc_1.50.0             ragg_1.3.2                 
 [10] vctrs_0.6.5                 memoise_2.0.1               GiottoUtils_0.1.10         
 [13] terra_1.7-78                htmltools_0.5.8.1           S4Arrays_1.4.1             
 [16] raster_3.6-26               SparseArray_1.4.8           sass_0.4.9                 
 [19] bslib_0.8.0                 KernSmooth_2.23-24          htmlwidgets_1.6.4          
 [22] plyr_1.8.9                  plotly_4.10.4               cachem_1.1.0               
 [25] igraph_2.0.3                lifecycle_1.0.4             pkgconfig_2.0.3            
 [28] rsvd_1.0.5                  Matrix_1.7-0                R6_2.5.1                   
 [31] fastmap_1.2.0               GenomeInfoDbData_1.2.12     MatrixGenerics_1.16.0      
 [34] digest_0.6.36               colorspace_2.1-1            S4Vectors_0.42.1           
 [37] irlba_2.3.5.1               textshaping_0.4.0           GenomicRanges_1.56.1       
 [40] beachmat_2.20.0             labeling_0.4.3              fansi_1.0.6                
 [43] httr_1.4.7                  polyclip_1.10-7             abind_1.4-5                
 [46] compiler_4.4.1              proxy_0.4-27                withr_3.0.1                
 [49] backports_1.5.0             BiocParallel_1.38.0         viridis_0.6.5              
 [52] DBI_1.2.3                   highr_0.11                  ggforce_0.4.2              
 [55] MASS_7.3-61                 DelayedArray_0.30.1         rjson_0.2.21               
 [58] classInt_0.4-10             gtools_3.9.5                GiottoVisuals_0.2.4        
 [61] tools_4.4.1                 units_0.8-5                 glue_1.7.0                 
 [64] dbscan_1.2-0                grid_4.4.1                  sf_1.0-16                  
 [67] checkmate_2.3.2             reshape2_1.4.4              generics_0.1.3             
 [70] gtable_0.3.5                class_7.3-22                tidyr_1.3.1                
 [73] data.table_1.15.4           BiocSingular_1.20.0         tidygraph_1.3.1            
 [76] ScaledMatrix_1.12.0         sp_2.1-4                    xml2_1.3.6                 
 [79] utf8_1.2.4                  XVector_0.44.0              BiocGenerics_0.50.0        
 [82] RcppAnnoy_0.0.22            ggrepel_0.9.5               pillar_1.9.0               
 [85] stringr_1.5.1               dplyr_1.1.4                 tweenr_2.0.3               
 [88] lattice_0.22-6              deldir_2.0-4                tidyselect_1.2.1           
 [91] SingleCellExperiment_1.26.0 knitr_1.48                  gridExtra_2.3              
 [94] bookdown_0.40               IRanges_2.38.1              SummarizedExperiment_1.34.0
 [97] scattermore_1.2             stats4_4.4.1                xfun_0.46                  
[100] graphlayouts_1.1.1          Biobase_2.64.0              matrixStats_1.3.0          
[103] stringi_1.8.4               UCSC.utils_1.0.0            lazyeval_0.2.2             
[106] yaml_2.3.10                 evaluate_0.24.0             codetools_0.2-20           
[109] ggraph_2.2.1                tibble_3.2.1                BiocManager_1.30.23        
[112] colorRamp2_0.1.0            cli_3.6.3                   uwot_0.2.2                 
[115] reticulate_1.38.0           systemfonts_1.1.0           jquerylib_0.1.4            
[118] munsell_0.5.1               Rcpp_1.0.13                 GenomeInfoDb_1.40.1        
[121] png_0.1-8                   parallel_4.4.1              ggplot2_3.5.1              
[124] exactextractr_0.10.0        SpatialExperiment_1.14.0    viridisLite_0.4.2          
[127] scales_1.3.0                e1071_1.7-14                purrr_1.0.2                
[130] crayon_1.5.3                rlang_1.1.4                 cowplot_1.1.3