TF-IDF (Term Frequency-Inverse Document Frequency) normalization is borrowed from natural language processing to identify features that are highly expressed in specific samples but not widely expressed across the entire dataset.
$$\LARGE TF_{i,j} = \frac{x_{i,j}}{\sum_{i} x_{i,j}} $$
$$\LARGE IDF_{i} = \log(1 + \frac{n_{samples}}{1 + n_{samples \: where \: feature \: i > 0}}) $$
$$\LARGE TFIDF_{i,j} = TF_{i,j} \times IDF_{i} $$
Where:
(\(x_{i,j}\)) is the raw count for feature \(i\) in sample \(j\)
(\(TF_{i,j}\)) is the term frequency of feature \(i\) in sample \(j\)
(\(IDF_{i}\)) is the inverse document frequency of feature \(i\)
(\(TFIDF_{i,j}\)) is the final TF-IDF normalized value
L2 normalization is commonly performed after TF-IDF normalization
None
Other normalization parameters:
norm_arcsinh
,
norm_default
,
norm_l2
,
norm_library
,
norm_log
,
norm_osmfish
,
norm_pearson
,
norm_quantile