细胞周期预测 | 单细胞转录组(scRNA-seq)分析_03

一、介绍

前置知识：原创 Seurat 包图文详解 | 单细胞转录组(scRNA-seq)分析02

使用Seurat包来运行，主要实现两个功能：

通过marker基因计算细胞周期评分
基于评分在预处理过程中，减轻单细胞转录组数据中细胞周期异质性影响

二、预处理

数据来自：http://www.bloodjournal.org/content/early/2016/06/30/blood-2016-05-716480?sso-checked=true

数据下载：https://www.dropbox.com/s/3dby3bjsaf5arrw/cell_cycle_vignette_files.zip?dl=1

rm(list = ls())
library(Seurat)

# 读取表达矩阵的第一行作为表头，第一列作为行名
exp.mat <- read.table(file = "./data/cell_cycle_vignette_files/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE, 
                      as.is = TRUE, row.names = 1)


# 细胞周期的Marker基因
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

# 创建并初始化 Seurat 对象
marrow <- CreateSeuratObject(counts = exp.mat)
marrow <- NormalizeData(marrow)
marrow <- FindVariableFeatures(marrow, selection.method = "vst")
marrow <- ScaleData(marrow, features = rownames(marrow)rm(list = ls())
library(Seurat)

# 读取表达矩阵的第一行作为表头，第一列作为行名
exp.mat <- read.table(file = "./data/cell_cycle_vignette_files/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE, 
                      as.is = TRUE, row.names = 1)


# 细胞周期的Marker基因
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

# 创建并初始化 Seurat 对象
marrow <- CreateSeuratObject(counts = exp.mat)
marrow <- NormalizeData(marrow)
marrow <- FindVariableFeatures(marrow, selection.method = "vst")
marrow <- ScaleData(marrow, features = rownames(marrow))

如果直接使用 FindCariableFeatures 找高异质性基因。

marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10)
DimHeatmap(marrow, dims = c(8, 10)marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10)
DimHeatmap(marrow, dims = c(8, 10))

mark

在PCA后，会发现PC8与PC10中包括TOP2A和MKI67的细胞周期mark。可以尝试数据中回归该信号，避免影响下游分析。

三、获取细胞周期分数

首先，我们根据每个细胞的G2 / M和S期标志物的表达为其分配分数。这些标记物组在表达水平上应该是反相关的，并且不表达它们的细胞可能不会循环并处于G1期。

我们在CellCycleScoring函数中分配分数，该函数将S和G2 / M分数存储在对象元数据中，以及G2M，S或G1阶段中每个单元格的预测分类。CellCycleScoring也可以通过传递将Seurat对象的标识设置为细胞周期阶段set.ident = TRUE（原始标识存储为old.ident）。请注意，Seurat在下游细胞周期回归中不使用离散分类（G2M / G1 / S）。相反，它使用G2M和S期的定量评分。但是，如果有兴趣，我们会提供预测的分类。

marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)

# 查看细胞周期分数和阶段分配
head(marrow[[]])

# 可视化整个细胞周期标记的分布
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)

# 查看细胞周期分数和阶段分配
head(marrow[[]])

# 可视化整个细胞周期标记的分布
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2)

mark

# 对细胞周期基因做PCA，揭示了细胞完全按阶段分离
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow# 对细胞周期基因做PCA，揭示了细胞完全按阶段分离
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)

mark

四、在数据缩放期间回归出细胞周期得分

marrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
# 高可变基因上的PCA不再返回与细胞周期相关的成分
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
# 仅对细胞周期基因运行PCA时，细胞不再按细胞周期阶段分离
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrowmarrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
# 高可变基因上的PCA不再返回与细胞周期相关的成分
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
# 仅对细胞周期基因运行PCA时，细胞不再按细胞周期阶段分离
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)

mark

由于最佳的细胞周期标记在组织和物种之间极为保守，因此我们发现此程序可在各种数据集上稳定可靠地工作。

五、备用工作流程

上面的过程将删除与细胞周期相关的所有信号。在某些情况下，会对下游分析产生负面影响，尤其是在分化过程中（如鼠类造血），在此过程中干细胞处于静止状态，分化的细胞正在增殖（反之亦然）。

在这种情况下，清除所有细胞周期效应也会使干细胞和祖细胞之间的区别变得模糊。

作为替代方案，建议逐步淘汰G2M和S期评分之间的差异。

这意味着将维持分离非循环细胞和循环细胞的信号，但是增殖细胞之间的细胞周期相位差异（通常没用）将被从数据中去除。

marrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score
marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))

# PCA减少细胞周期效应
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)

# 在细胞周期基因上运行PCA时，活跃增殖的细胞与G1细胞仍然不同，但是在活跃增殖的细胞中，G2M和S期细胞会聚在一起
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrowmarrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score
marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))

# PCA减少细胞周期效应
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)

# 在细胞周期基因上运行PCA时，活跃增殖的细胞与G1细胞仍然不同，但是在活跃增殖的细胞中，G2M和S期细胞会聚在一起
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)