一、介绍
前置知识:原创 Seurat 包图文详解 | 单细胞转录组(scRNA-seq)分析02
使用Seurat包来运行,主要实现两个功能:
- 通过marker基因计算细胞周期评分
- 基于评分在预处理过程中,减轻单细胞转录组数据中细胞周期异质性影响
二、预处理
数据来自:http://www.bloodjournal.org/content/early/2016/06/30/blood-2016-05-716480?sso-checked=true
数据下载:https://www.dropbox.com/s/3dby3bjsaf5arrw/cell_cycle_vignette_files.zip?dl=1
rm(list = ls()) library(Seurat)
exp.mat <- read.table(file = "./data/cell_cycle_vignette_files/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE, as.is = TRUE, row.names = 1)
s.genes <- cc.genes$s.genes g2m.genes <- cc.genes$g2m.genes
marrow <- CreateSeuratObject(counts = exp.mat) marrow <- NormalizeData(marrow) marrow <- FindVariableFeatures(marrow, selection.method = "vst") marrow <- ScaleData(marrow, features = rownames(marrow)rm(list = ls()) library(Seurat)
exp.mat <- read.table(file = "./data/cell_cycle_vignette_files/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE, as.is = TRUE, row.names = 1)
s.genes <- cc.genes$s.genes g2m.genes <- cc.genes$g2m.genes
marrow <- CreateSeuratObject(counts = exp.mat) marrow <- NormalizeData(marrow) marrow <- FindVariableFeatures(marrow, selection.method = "vst") marrow <- ScaleData(marrow, features = rownames(marrow))
|
如果直接使用 FindCariableFeatures
找高异质性基因。
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10) DimHeatmap(marrow, dims = c(8, 10)marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10) DimHeatmap(marrow, dims = c(8, 10))
|
在PCA后,会发现PC8与PC10中包括TOP2A和MKI67的细胞周期mark。可以尝试数据中回归该信号,避免影响下游分析。
三、获取细胞周期分数
首先,我们根据每个细胞的G2 / M和S期标志物的表达为其分配分数。这些标记物组在表达水平上应该是反相关的,并且不表达它们的细胞可能不会循环并处于G1期。
我们在CellCycleScoring
函数中分配分数,该函数将S和G2 / M分数存储在对象元数据中,以及G2M,S或G1阶段中每个单元格的预测分类。CellCycleScoring
也可以通过传递将Seurat对象的标识设置为细胞周期阶段set.ident = TRUE
(原始标识存储为old.ident
)。请注意,Seurat在下游细胞周期回归中不使用离散分类(G2M / G1 / S)。相反,它使用G2M和S期的定量评分。但是,如果有兴趣,我们会提供预测的分类。
marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
head(marrow[[]])
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
head(marrow[[]])
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2)
|
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrow marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrow)
|
四、在数据缩放期间回归出细胞周期得分
marrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrowmarrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrow)
|
由于最佳的细胞周期标记在组织和物种之间极为保守,因此我们发现此程序可在各种数据集上稳定可靠地工作。
五、备用工作流程
上面的过程将删除与细胞周期相关的所有信号。在某些情况下,会对下游分析产生负面影响,尤其是在分化过程中(如鼠类造血),在此过程中干细胞处于静止状态,分化的细胞正在增殖(反之亦然)。
在这种情况下,清除所有细胞周期效应也会使干细胞和祖细胞之间的区别变得模糊。
作为替代方案,建议逐步淘汰G2M和S期评分之间的差异。
这意味着将维持分离非循环细胞和循环细胞的信号,但是增殖细胞之间的细胞周期相位差异(通常没用)将被从数据中去除。
marrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrowmarrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes)) DimPlot(marrow)
|