Title: | Single-Cell Correlation Based Cell Type Annotation |
---|---|
Description: | Performing cell type annotation based on cell markers from a unified database. The approach utilizes correlation-based approach combined with association analysis using Fisher-exact and phyper statistical tests (Upton, Graham JG. (1992) <DOI:10.2307/2982890>). |
Authors: | Mohamed Soudy [aut, cre], Sophie LE BARS [aut], Enrico Glaab [aut] |
Maintainer: | Mohamed Soudy <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2025-02-07 05:30:13 UTC |
Source: | https://github.com/cran/sccca |
This Function is used to perform cell aggregation by averaging the expression of scRNA-seq matrix and then perform correlation matrix
calculate_cor_mat(expression_mat, condition = NULL, clusters, assay = "RNA")
calculate_cor_mat(expression_mat, condition = NULL, clusters, assay = "RNA")
expression_mat |
Seurat object that contains the expression matrix. |
condition |
column name of the condition in th meta data of the Seurat object. |
clusters |
column name of the cluster numbers in the meta data of the Seurat object. |
assay |
the assay to be used default is set to RNA |
correlation matrix of genes.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to calculate cell scores based on number of genes
calculate_normalized_ratio(vec)
calculate_normalized_ratio(vec)
vec |
list of genes of cell types. |
vector of cell scores based on the number of overlapped genes with the input matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to return the cell markers names processed for the sctype approach
correct_gene_symbols(markers)
correct_gene_symbols(markers)
markers |
list of unique cell markers. |
vector of genes names which overlap with the correlation matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to perform parallel function on two lists
enrich_genes(ref_list, overlap_list, func)
enrich_genes(ref_list, overlap_list, func)
ref_list |
reference list. |
overlap_list |
overlap list. |
func |
function to be applied. |
list where each element is the result of applying the function 'func' to the corresponding elements of 'ref_list' and 'overlap_list'.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to filter the gene correlation matrix based on user-defined threshold
filter_correlation(cor_mat, gene_list, threshold = 0.7)
filter_correlation(cor_mat, gene_list, threshold = 0.7)
cor_mat |
correlation matrix generated from calculate_cor_mat function. |
gene_list |
cell markers that passed threshold. |
threshold |
absolute correlation threshold. |
vector of gene names that pass user-defined correlation threshold.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to return the cell markers that overlap between the cell markers and scRNA matrix
filter_list(gene_list, passed_cells)
filter_list(gene_list, passed_cells)
gene_list |
list of unique genes of cell types. |
passed_cells |
cells types that pass the specified threshold. |
list of cell types which genes are found in the input matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to perform fisher exact test to get cell types
fisher_test(ref, gene_overlap)
fisher_test(ref, gene_overlap)
ref |
reference gene set. |
gene_overlap |
genes that pass the correlation threshold. |
vector of p-value and overlap.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
fisher_test(c("PAX8","PAX6","TP53","AOC3","LIPF"), c("LIPF","PAX8","PAX6","TP53","TSHB","AOC3"))
fisher_test(c("PAX8","PAX6","TP53","AOC3","LIPF"), c("LIPF","PAX8","PAX6","TP53","TSHB","AOC3"))
This Function is used to return the cell markers that pass specific threshold in the gene correlation matrix
match_characters(genes, gene_mat)
match_characters(genes, gene_mat)
genes |
list of unique genes of cell types. |
gene_mat |
correlation matrix of genes. |
vector of genes names which overlap with the correlation matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to perform phyper test to get cell types
phyper_test(ref, overlap)
phyper_test(ref, overlap)
ref |
reference gene set. |
overlap |
genes that pass the correlation threshold. |
vector of p-value and overlap.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
phyper_test(c("PAX8","PAX6","TP53","AOC3","LIPF"), c("LIPF","PAX8","PAX6","TP53","TSHB","AOC3"))
phyper_test(c("PAX8","PAX6","TP53","AOC3","LIPF"), c("LIPF","PAX8","PAX6","TP53","TSHB","AOC3"))
This Function is used to get the associated cell clusters using correlation-based approach
process_clus(cluster,sobj,assay="RNA",clus,markers,cor_m,m_t=0.9,c_t=0.7,test="p")
process_clus(cluster,sobj,assay="RNA",clus,markers,cor_m,m_t=0.9,c_t=0.7,test="p")
cluster |
associated cluster name. |
sobj |
Seurat object. |
assay |
assay to be used default is set to RNA. |
clus |
cell clusters. |
markers |
cell markers database. |
cor_m |
gene correlation matrix. |
m_t |
overlap threshold between cell markers and expression matrix. |
c_t |
correlation threshold between genes. |
test |
statistical test that check if overlap is significant could be "p" for phyper or "f" for fisher. |
data frame of proposed cell types.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to process the database that will be used for sctype approach
process_database(database_name = "sctype", org = 'a', tissue, tissue_type = 'n')
process_database(database_name = "sctype", org = 'a', tissue, tissue_type = 'n')
database_name |
name of the database to be used that can be 'sctype' or 'UMD'. |
org |
name of organism to be used that can be 'h' for human, 'm' for mouse, and 'a' for all markers. |
tissue |
specified tissue from which the data comes. |
tissue_type |
tissue type whether 'a' for all types 'n' for normal tissues only or "c" for cancer tissues. |
vector of genes names which overlap with the correlation matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to process the cell markers database and return the processed list
process_markers(markers_df)
process_markers(markers_df)
markers_df |
data frame with markers named as gene_original and cell names as cell type. |
list of lists of the processed markers
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to run the main pipeline that does the cell type assignment
sccca(sobj,assay="RNA",cluster,marker,tissue,tt="a",cond,m_t=0.9,c_t=0.7,test="p",org="a")
sccca(sobj,assay="RNA",cluster,marker,tissue,tt="a",cond,m_t=0.9,c_t=0.7,test="p",org="a")
sobj |
Seurat object. |
assay |
assay to be used default is set to RNA. |
cluster |
colname in the mata.data that have the cell cluster numbers. |
marker |
cell markers database path. |
tissue |
specified tissue from which the data comes. |
tt |
tissue type whether 'a' for all types 'n' for normal tissues only or "c" for cancer tissues. |
cond |
colname in the meta.data that have the condition names. |
m_t |
overlap threshold between cell markers and expression matrix. |
c_t |
correlation threshold between genes. |
test |
statistical test that check if overlap is significant could be "p" for phyper or "f" for fisher. |
org |
organism to be used that can be 'h' for human, 'm' for mouse, and 'a' for all markers. |
list of Seurat object that have the assigned clusters, and top 3 proposed cell types.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]
This Function is used to run the sctype approach with faster implementation
sctype(sobj,assay="RNA",tissue,tt="a",clus,org="a",scaled=T,database="sctype")
sctype(sobj,assay="RNA",tissue,tt="a",clus,org="a",scaled=T,database="sctype")
sobj |
Seurat object. |
assay |
assay to be used default is set to RNA. |
tissue |
specified tissue from which the data comes. |
tt |
tissue type whether 'a' for all types 'n' for normal tissues only or "c" for cancer tissues. |
clus |
colname in the mata.data that have the cell cluster numbers. |
org |
organism to be used that can be 'h' for human, 'm' for mouse, and 'a' for all markers. |
scaled |
indicates whether the matrix is scaled (TRUE by default) |
database |
name of the database to be used that can be 'sctype' or 'UMD' |
vector of genes names which overlap with the correlation matrix.
Mohmed Soudy [email protected] and Sohpie LE BARS [email protected] and Enrico Glaab [email protected]