Operations

Introduction

This vignette introduces the generic functions and operations supported by dbMatrix.

Loading library

library(dbMatrix)
library(Matrix)

dbMatrix generics

dbMatrix objects currently support several statistical matrix operations listed below with support for more coming soon.

✅ - implemented 🟧 - not yet implemented

	dbSparseMatrix	dbDenseMatrix
colSums	✅	✅
rowSums	✅	✅
colMeans	✅	✅
rowMeans	✅	✅
colSds	🟧	✅
rowSds	🟧	✅
t	✅	✅
mean	✅	✅
nrow	✅	✅
ncol	✅	✅
dims	✅	✅
head	✅	✅
tail	✅	✅
…

dbSparse Matrix Operations

Get test data

The test file is a dgCMatrixor compressed sparse column matrix representing a single cell gene expression matrix. The file is in the data directory of the package.

Let’s load the .rds file and preview the object.

dgc <- readRDS("../data/dgc.rds")

dplyr::glimpse(dgc)

## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:170625] 0 6 10 17 21 22 25 31 33 35 ...
##   ..@ p       : int [1:625] 0 227 510 758 980 1293 1631 1976 2223 2434 ...
##   ..@ Dim     : int [1:2] 634 624
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:634] "Gna12" "Ccnd2" "Btbd17" "Sox9" ...
##   .. ..$ : chr [1:624] "AAAGGGATGTAGCAAG-1" "AAATGGCATGTCTTGT-1" "AAATGGTCAATGTGCC-1" "AAATTAACGGGTAGCT-1" ...
##   ..@ x       : num [1:170625] 1 1 1 1 1 1 6 2 1 1 ...
##   ..@ factors : list()

# create dbSparseMatrix from the same dgc
con <- DBI::dbConnect(duckdb::duckdb(), ":memory:")

sparse <- dbMatrix(value = dgc, 
                   con = con, 
                   name = 'visium', 
                   class = "dbSparseMatrix",
                   overwrite = TRUE)

# preview 
# show function aims to emulate the show method for dgCMatrix
head(sparse)

## 6 x 624  matrix of class "dbSparseMatrix"

## [[ Colnames 'AAAGGGATGTAGCAAG-1', 'AAATGGCATGTCTTGT-1', 'AAATGGTCAATGTGCC-1' ... suppressing 618 ...'TTGTCGTTCAGTTACC-1', 'TTGTGGCCCTGACAGT-1', 'TTGTTCAGTGTGCTAC-1' ]]

##                                  
## Gna12         1 2 1 1 9 1 3 5 3 .
##  [ reached getOption("max.print") -- omitted 633 rows ]

transpose

dbMatrix::t(sparse)

## 624 x 634  matrix of class "dbSparseMatrix"

## [[ Colnames 'Gna12', 'Ccnd2', 'Btbd17' ... suppressing 628 ...'Gm19935', '9630013A20Rik', '2900040C04Rik' ]]

##                                       
## AAAGGGATGTAGCAAG-1 1 . . . . . 1 . . .
##  [ reached getOption("max.print") -- omitted 5 rows ]
## NA
## 
## ......suppressing 624 columns and 618 rows
## 
## NA
## NA
## NA

colMeans

dbMatrix::colMeans(sparse)

## 1 x 624  matrix of class "dbDenseMatrix"

## [[ Colnames 'AAAGGGATGTAGCAAG-1', 'AAATGGCATGTCTTGT-1', 'AAATGGTCAATGTGCC-1' ... suppressing 618 ...'TTGTCGTTCAGTTACC-1', 'TTGTGGCCCTGACAGT-1', 'TTGTTCAGTGTGCTAC-1' ]]

##                                                                            
## row1 0.74132 1.32965 1.14353 0.83438 1.61356 2.21293 1.7224 1.90221 0.73975
##             
## row1 0.92587

colSums

dbMatrix::colSums(sparse)

## 1 x 624  matrix of class "dbDenseMatrix"

## [[ Colnames 'AAAGGGATGTAGCAAG-1', 'AAATGGCATGTCTTGT-1', 'AAATGGTCAATGTGCC-1' ... suppressing 618 ...'TTGTCGTTCAGTTACC-1', 'TTGTGGCCCTGACAGT-1', 'TTGTTCAGTGTGCTAC-1' ]]

##                                                 
## row1 470 843 725 529 1023 1403 1092 1206 469 587

rowMeans

dbMatrix::rowMeans(sparse)

## 634 x 1  matrix of class "dbDenseMatrix"

## [[ Colnames: 'col1' ]]

##                      
## Gna12         2.71795
## Ccnd2         1.73237
## Btbd17        0.55288
## 
## ......suppressing 628 rows
## 
## Gm19935       0.20833
## 9630013A20Rik 0.17147
## 2900040C04Rik 0.15545

rowSums

dbMatrix::rowSums(sparse)

## 634 x 1  matrix of class "dbDenseMatrix"

## [[ Colnames: 'col1' ]]

##                   
## Gna12         1696
## Ccnd2         1081
## Btbd17         345
## 
## ......suppressing 628 rows
## 
## Gm19935        130
## 9630013A20Rik  107
## 2900040C04Rik   97

dim

dim(sparse)

## [1] 634 624

dim(dgc)

## [1] 634 624

Check results are equivalent

Click to expand

  all.equal(dbMatrix::colMeans(sparse), Matrix::colMeans(dgc))

  ## [1] "Modes: S4, numeric"                                               
  ## [2] "Lengths: 1, 624"                                                  
  ## [3] "names for current but not for target"                             
  ## [4] "Attributes: < names for target but not for current >"             
  ## [5] "Attributes: < Length mismatch: comparison on first 0 components >"

  all.equal(dbMatrix::colSums(sparse), Matrix::colSums(dgc))

  ## [1] "Modes: S4, numeric"                                               
  ## [2] "Lengths: 1, 624"                                                  
  ## [3] "names for current but not for target"                             
  ## [4] "Attributes: < names for target but not for current >"             
  ## [5] "Attributes: < Length mismatch: comparison on first 0 components >"

  all.equal(dbMatrix::rowMeans(sparse), Matrix::rowMeans(dgc))

  ## [1] "Modes: S4, numeric"                                               
  ## [2] "Lengths: 1, 634"                                                  
  ## [3] "names for current but not for target"                             
  ## [4] "Attributes: < names for target but not for current >"             
  ## [5] "Attributes: < Length mismatch: comparison on first 0 components >"

  all.equal(dbMatrix::rowSums(sparse), Matrix::rowSums(dgc))

  ## [1] "Modes: S4, numeric"                                               
  ## [2] "Lengths: 1, 634"                                                  
  ## [3] "names for current but not for target"                             
  ## [4] "Attributes: < names for target but not for current >"             
  ## [5] "Attributes: < Length mismatch: comparison on first 0 components >"

dbDenseMatrix Operations

# below is a convenience function to simulate a dbDenseMatrix
dense = dbMatrix::sim_dbDenseMatrix()

# preview
dense

## 50 x 50  matrix of class "dbDenseMatrix"

##                                                                              
## row1   1.37096  0.32193  1.20097  -0.0407 -2.00093 -1.09616 -0.00462  0.72417
##                        
## row1   1.33491 -1.30382
## 
## ......suppressing 40 columns and 44 rows
## 
##  [ reached getOption("max.print") -- omitted 5 rows ]
## NA
## NA

transpose

dbMatrix::t(dense)

## 50 x 50  matrix of class "dbDenseMatrix"

## [[ Colnames 'row1', 'row2', 'row3' ... suppressing 44 ...'row48', 'row49', 'row50' ]]

##                                                                              
## col1   1.37096  -0.5647  0.36313  0.63286  0.40427 -0.10612  1.51152 -0.09466
##                        
## col1   2.01842 -0.06271
## 
## ......suppressing 40 columns and 44 rows
## 
##  [ reached getOption("max.print") -- omitted 5 rows ]
## NA
## NA

colMeans

dbMatrix::colMeans(dense)

## 1 x 50  matrix of class "dbDenseMatrix"

## [[ Colnames 'col1', 'col2', 'col3' ... suppressing 44 ...'col48', 'col49', 'col50' ]]

##                                                                          
## row1 -0.03562 0.10074 -0.15122 -0.02372 0.00794 -0.02862 -0.06156 0.12742
##                       
## row1 -0.11944 -0.11622

colSums

dbMatrix::colSums(dense)

## 1 x 50  matrix of class "dbDenseMatrix"

## [[ Colnames 'col1', 'col2', 'col3' ... suppressing 44 ...'col48', 'col49', 'col50' ]]

## Warning: ORDER BY is ignored in subqueries without LIMIT
## ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?

##                                                                        
## row1 -1.781 5.037 -7.561 -1.186 0.397 -1.431 -3.078 6.371 -5.972 -5.811

rowMeans

dbMatrix::rowMeans(dense)

## 50 x 1  matrix of class "dbDenseMatrix"

## [[ Colnames: 'col1' ]]

##               
## row1   0.05362
## row2   0.00246
## row3   0.03132
## 
## ......suppressing 44 rows
## 
## row48 -0.10104
## row49  0.05928
## row50  0.02156

rowSums

dbMatrix::rowSums(dense)

## 50 x 1  matrix of class "dbDenseMatrix"

## [[ Colnames: 'col1' ]]

## Warning: ORDER BY is ignored in subqueries without LIMIT
## ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?

##             
## row1   2.681
## row2   0.123
## row3   1.566
## 
## ......suppressing 44 rows
## 
## row48 -5.052
## row49  2.964
## row50  1.078

mean

dbMatrix::mean(dense)

## [1] -0.0096724

dim

dim(dense)

## [1] 50 50

Session Info

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8    LC_NUMERIC=C        LC_TIME=C.UTF-8    
##  [4] LC_COLLATE=C.UTF-8  LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
##  [7] LC_PAPER=C.UTF-8    LC_NAME=C           LC_ADDRESS=C       
## [10] LC_TELEPHONE=C     
##  [ reached getOption("max.print") -- omitted 2 entries ]
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Matrix_1.7-0        dbMatrix_0.0.0.9023
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.8.8    crayon_1.5.3      dplyr_1.1.4       compiler_4.4.1   
##  [5] tidyselect_1.2.1  blob_1.2.4        jquerylib_0.1.4   systemfonts_1.1.0
##  [9] textshaping_0.4.0 yaml_2.3.10      
##  [ reached getOption("max.print") -- omitted 38 entries ]

2024-09-18

Introduction

Loading library

dbMatrix generics

dbSparse Matrix Operations

Get test data

transpose

colMeans

colSums

rowMeans

rowSums

dim

Check results are equivalent

dbDenseMatrix Operations

transpose

colMeans

colSums

rowMeans

rowSums

mean

dim

Session Info