【原】跟著大神學(xué)單細(xì)胞數(shù)據(jù)分析

健明 2021-07-15

展開全文

前言

這是 Tang Ming 大神分享的單細(xì)胞分析的seurat流程。今天我們來(lái)理一下大致的分析思路，當(dāng)然里面好多細(xì)節(jié)的部分還需要自己下功夫慢慢研究。

原文鏈接如下：
https://crazyhottommy./scRNA-seq-workshop-Fall-2019/scRNAseq_workshop_1.html

下載數(shù)據(jù)

我們將下載來(lái)自10x Genomics的公共 5k pbmc (外周血單核細(xì)胞)數(shù)據(jù)集。然后用R分析

1wget http://cf./samples/cell-exp/3.0.2/5k_pbmc_v3/5k_pbmc_v3_filtered_feature_bc_matrix.tar.gz
2
3tar xvzf 5k_pbmc_v3_filtered_feature_bc_matrix.tar.gz

安裝所需的R包

1install.packages("tidyverse")
2install.packages("rmarkdown")
3install.packages('Seurat')

如果你已經(jīng)安裝過這寫R包，你可以忽略這一步。如果還沒有安裝或者安裝R包有問題，可以參考下面的教程：

rstudio軟件無(wú)需聯(lián)網(wǎng)但是
 BiocManger無(wú)法安裝R包
 批量安裝R包小技巧大放送

讀入數(shù)據(jù)

1# 讀取PBMC數(shù)據(jù)集
2pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
3# 使用原始數(shù)據(jù)（未歸一化處理）初始化Seurat對(duì)象
4pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc5k", min.cells = 3, min.features = 200)
5pbmc
6

1An object of class Seurat 
218791 features across 4962 samples within 1 assay 
3Active assay: RNA (18791 features)

如果你想了解更多Seurat對(duì)象的詳細(xì)信息，你可以參考這個(gè)網(wǎng)站：https://github.com/satijalab/seurat/wiki

注：讀入數(shù)據(jù)這一步使用的Seurat包應(yīng)該是 Seurat V3版本。因?yàn)槲矣肧eurat V2創(chuàng)建的對(duì)象和文中所給的結(jié)果不一致
1## 使用Srurat V2 創(chuàng)建對(duì)象 2pbmc <- CreateSeuratObject(raw.data = pbmc.data, project = "pbmc5k", min.cells = 3, min.features = 200) 3 4pbmc 5 6An object of class seurat in project pbmc5k 7 18791 genes across 5025 samples.

質(zhì)量控制

 1## check at metadata
 2head(pbmc@meta.data)
 3# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
 4pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
 5pbmc@meta.data %>% head()
 6
 7##將質(zhì)量控制指標(biāo)可視化為小提琴圖
 8VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
 9
10#我們根據(jù)上面的可視化設(shè)置了截止值。這個(gè)截止值是相當(dāng)主觀的。
11pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 25)

Normalization

通常情況下，我們采用全局縮放的歸一化方法"LogNormalize"

1pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
2

不過，現(xiàn)在Seurat也有一個(gè)新的標(biāo)準(zhǔn)化的方法，稱為SCTransform . 詳細(xì)了解可以查看：https:///seurat/v3.0/sctransform_vignette.html

特征選擇

 1pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
 2
 3# Identify the 10 most highly variable genes
 4top10 <- head(VariableFeatures(pbmc), 10)
 5
 6# plot variable features with and without labels
 7plot1 <- VariableFeaturePlot(pbmc)
 8plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
 9
10CombinePlots(plots = list(plot1, plot2), ncol =1)
11

Scaling the data

ScaleData函數(shù)：

Shifts the expression of each gene, so that the mean expression across cells is 0
Scales the expression of each gene, so that the variance across cells is 1

我們一般將平均值為0，方差值為1的數(shù)據(jù)認(rèn)為是標(biāo)準(zhǔn)數(shù)據(jù)

1all.genes <- rownames(pbmc)
2pbmc <- ScaleData(pbmc, features = all.genes)

如果數(shù)據(jù)量很大，這一步可能需要較長(zhǎng)時(shí)間

在scale前后檢查數(shù)據(jù)

1## 檢查前后數(shù)據(jù)的區(qū)別
2#### raw counts, same as pbmc@assays$RNA@counts[1:6, 1:6]
3pbmc[["RNA"]]@counts[1:6, 1:6]
4### library size normalized and log transformed data
5pbmc[["RNA"]]@data[1:6, 1:6]
6### scaled data
7pbmc[["RNA"]]@scale.data[1:6, 1:6]

scale是Seurat工作流程中必不可少的一步。但結(jié)果僅限于用作PCA分析的輸入。

ScaleData中默認(rèn)設(shè)置是僅對(duì)先前標(biāo)識(shí)的變量特征執(zhí)行降維（默認(rèn)為2000）.因此，在上一個(gè)函數(shù)調(diào)用中應(yīng)省略features參數(shù)。

1pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt")

PCA

主成分分析（PCA）是一種線性降維技術(shù)

1pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), verbose = FALSE)
2
3p1<- DimPlot(pbmc, reduction = "pca")
4p1
5

如果想了解更多PCA相關(guān)的，可以在YouTube觀看StatQuest的: https://www./watch?v=HMOI_lkzW08

或者看下面的教程：
聚類分析和主成分分析

或者原作者的博客：

https://divingintogeneticsandgenomics./post/pca-in-action/
https://divingintogeneticsandgenomics./post/permute-test-for-pca-components/

當(dāng)然你也可以用ggplot2畫出各種好看的PCA圖，網(wǎng)上搜索的話，畫圖代碼有很多。這里不再論述。

確定PCs數(shù)

為了克服scRNA序列數(shù)據(jù)單一特征中的廣泛技術(shù)噪音，Seurat根據(jù)其PCA分?jǐn)?shù)對(duì)細(xì)胞進(jìn)行聚類，每個(gè)PC基本上表示一個(gè)“元特征”，該特征結(jié)合了相關(guān)特征集上的信息。因此，最主要的主成分代表了數(shù)據(jù)集的強(qiáng)大壓縮。但是，我們應(yīng)該選擇包括多少個(gè)PC？10個(gè)？20？還是100？

可以用如下方法來(lái)大致判定：

1pbmc <- JackStraw(pbmc, num.replicate = 100, dims = 50)
2pbmc <- ScoreJackStraw(pbmc, dims = 1:50)
3
4JackStrawPlot(pbmc, dims = 1:30)
5

1ElbowPlot(pbmc, ndims = 50)

variance explained by each PC

 1mat <- pbmc[["RNA"]]@scale.data 
 2pca <- pbmc[["pca"]]
 3
 4# Get the total variance:
 5total_variance <- sum(matrixStats::rowVars(mat))
 6
 7eigValues = (pca@stdev)^2  ## EigenValues
 8varExplained = eigValues / total_variance
 9
10varExplained %>% enframe(name = "PC", value = "varExplained" ) %>%
11  ggplot(aes(x = PC, y = varExplained)) + 
12  geom_bar(stat = "identity") +
13  theme_classic() +
14  ggtitle("scree plot")

1### this is what Seurat is plotting: standard deviation
2pca@stdev %>% enframe(name = "PC", value = "Standard Deviation" ) %>%
3  ggplot(aes(x = PC, y = `Standard Deviation`)) + 
4  geom_point() +
5  theme_classic()

細(xì)胞分群

1pbmc <- FindNeighbors(pbmc, dims = 1:20)
2pbmc <- FindClusters(pbmc, resolution = 0.5)
3# Look at cluster IDs of the first 5 cells
4head(Idents(pbmc), 5)

運(yùn)行非線性降維(UMAP/tSNE)

1pbmc <- RunUMAP(pbmc, dims = 1:20)
2pbmc<- RunTSNE(pbmc, dims = 1:20)
3
4## after we run UMAP and TSNE, there are more entries in the reduction slot
5str(pbmc@reductions)
6
7DimPlot(pbmc, reduction = "umap", label = TRUE)

1## now let's visualize in the TSNE space
2DimPlot(pbmc, reduction = "tsne")

tSNE相關(guān)視頻: https://www./watch?v=NEaUSP4YerM

1## now let's label the clusters in the PCA space
2DimPlot(pbmc, reduction = "pca")

查找差異表達(dá)特征（集群生物標(biāo)記）

 1# find all markers of cluster 1
 2cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
 3head(cluster1.markers, n = 5)
 4# find all markers distinguishing cluster 5 from clusters 0 and 3
 5cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
 6head(cluster5.markers, n = 5)
 7# find markers for every cluster compared to all remaining cells, report only the positive ones
 8pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
 9pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
10

這一步很費(fèi)時(shí)間，如果你覺得慢，Seurat V3.0.2 為FindALLMarkers在內(nèi)的一些步驟提供了并行支持。
更多了解：https:///seurat/v3.0/future_vignette.html

1# we only have 2 CPUs reserved for each one. 
2plan("multiprocess", workers = 2)
3pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

可視化marker基因

VlnPlot

1VlnPlot(pbmc, features = c("MS4A1", "CD79A"))

1## understanding the matrix of data slots
2pbmc[["RNA"]]@data[c("MS4A1", "CD79A"), 1:30]
3pbmc[["RNA"]]@scale.data[c("MS4A1", "CD79A"), 1:30]
4pbmc[["RNA"]]@counts[c("MS4A1", "CD79A"), 1:30]
5# you can plot raw counts as well
6VlnPlot(pbmc, features = c("MS4A1", "CD79A"), slot = "counts", log = TRUE)

1VlnPlot(pbmc, features = c("MS4A1", "CD79A"), slot = "scale.data")

FeaturePlot
plot the expression intensity overlaid on the Tsne/UMAP plot.

1FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"))

1p<- FeaturePlot(pbmc, features = "CD14")
2
3## before reordering
4p

1p_after<- p
2### after reordering
3p_after$data <- p_after$data[order(p_after$data$CD14),]
4
5CombinePlots(plots = list(p, p_after))

DoHeatmap

1top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
2DoHeatmap(pbmc, features = top10$gene) + NoLegend()

贊賞

共11人贊賞

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：健明 > 《待分類》

舉報(bào)/認(rèn)領(lǐng)

0條評(píng)論

發(fā)表

請(qǐng)遵守用戶評(píng)論公約

類似文章

健明

關(guān)注對(duì)話

TA的最新館藏

百萬(wàn)細(xì)胞的Seurat對(duì)象依據(jù)第一層次降維聚類分群拆分成為子集
常規(guī)的差異和富集分析不夠？再搭配轉(zhuǎn)錄因子調(diào)控分析呢？
百萬(wàn)細(xì)胞舍我其誰(shuí)（一晚上解決戰(zhàn)斗）
我們開發(fā)的單細(xì)胞分析框架超過500個(gè)點(diǎn)贊拉！
驢的單細(xì)胞數(shù)據(jù)基因ID如何轉(zhuǎn)換？
話卅三 | 聽說(shuō)你想知道生信年度總結(jié)怎么做？

喜歡該文的人也喜歡更多

熱門閱讀換一換

小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看