前言 Hello小伙伴們大家好,我是生信技能樹(shù)的小學(xué)徒”我才不吃蛋黃“。今天是胃癌單細(xì)胞數(shù)據(jù)集GSE163558復(fù)現(xiàn)系列第十期。第九期我們用TCGA-STAD數(shù)據(jù)進(jìn)行生存分析。本期,我們將回到高級(jí)分析(CopyKAT分析),通過(guò)計(jì)算CNV評(píng)估上皮細(xì)胞良惡性。
1.背景介紹 在第六期推文中,我們聯(lián)合TCGA-STAD數(shù)據(jù)給上皮添加了惡性評(píng)分,從而協(xié)助區(qū)分腫瘤細(xì)胞與非腫瘤細(xì)胞。在高級(jí)分析中,我們可以利用inferCNV和CopyKAT對(duì)單細(xì)胞進(jìn)行CNV(Copy number variation, 拷貝數(shù)變異)分析,區(qū)分腫瘤細(xì)胞與非腫瘤細(xì)胞。
CNV是基因組結(jié)構(gòu)變異(Structural variation,SV)的重要組成部分,主要原因是基因組發(fā)生重排,一般指長(zhǎng)度為1kb以上的基因組大片段的拷貝數(shù)增加或者減少,主要表現(xiàn)為亞顯微水平的缺失和重復(fù)。CNV在腫瘤的發(fā)生和發(fā)展研究中扮演重要的角色。
本期我們使用CopyKAT (Copynumber Karyotyping of Tumors)進(jìn)行CNV分析,它是一種集成貝葉斯分割方法。利用scRNA-seq數(shù)據(jù)推斷人類(lèi)腫瘤的基因組拷貝數(shù),然后通過(guò)對(duì)拷貝數(shù)數(shù)據(jù)進(jìn)行分層聚類(lèi),以識(shí)別非整倍體腫瘤細(xì)胞和二倍體基質(zhì)細(xì)胞之間的最大距離。其原理同樣是通過(guò)單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)來(lái)推斷細(xì)胞的染色體倍數(shù),進(jìn)而推斷是正常細(xì)胞(diploid)還是腫瘤細(xì)胞(aneuploid)。
2.數(shù)據(jù)分析 2.1 導(dǎo)入數(shù)據(jù) 首先清除系統(tǒng)環(huán)境變量,加載R包,設(shè)置新文件夾和工作目錄,導(dǎo)入惡性上皮細(xì)胞數(shù)據(jù):
rm(list=ls()) options(stringsAsFactors = F) library(Seurat) library(ggplot2) library(clustree) library(cowplot) library(dplyr) library(infercnv) library(copykat) library(tidyverse)#devtools::install_github("broadinstitute/infercnv") dir.create("8-CNV" ) getwd() setwd("../8-CNV" ) sce=readRDS( "../6-TCGA_STAD/malignant.rds" ) table(sce$celltype ) Idents(sce) = sce$celltype
避免后面流程運(yùn)行的太長(zhǎng)了對(duì)細(xì)胞進(jìn)行抽樣:
seurat_object = sce seurat_object <- subset(seurat_object, downsample=200) table(Idents(seurat_object))
預(yù)測(cè)腫瘤/正常細(xì)胞狀態(tài)的基本原理是非整倍體在人類(lèi)癌癥中很常見(jiàn) (90%)。具有廣泛全基因組拷貝數(shù)畸變(非整倍體)的細(xì)胞被認(rèn)為是腫瘤細(xì)胞,而基質(zhì)正常細(xì)胞和免疫細(xì)胞通常具有2N二倍體或接近二倍體的拷貝數(shù)分布。copykat通過(guò)單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)來(lái)推斷細(xì)胞的染色體倍數(shù),進(jìn)而推斷是正常細(xì)胞(diploid)還是腫瘤細(xì)胞(aneuploid)。它還可以進(jìn)一步對(duì)腫瘤細(xì)胞進(jìn)行聚類(lèi),找出不同的亞群。
運(yùn)行CopyKAT ngene.chr
參數(shù)是過(guò)濾細(xì)胞的一個(gè)標(biāo)準(zhǔn),它要求被用來(lái)做CNV預(yù)測(cè)的細(xì)胞,一個(gè)染色體上至少有5個(gè)基因。
sam.name
定義樣本名稱 (sample name),會(huì)給出來(lái)的文件加前綴
scRNA <- seurat_object counts <- as.matrix(scRNA@assays$RNA $counts ) table(scRNA$celltype )if (T){cnv <- copykat(rawmat=counts, ngene.chr=5, sam.name="test" , n.cores=8) saveRDS(cnv, "cnv.rds" )}
添加CopyKAT結(jié)果到seurat對(duì)象meta.data
信息中:
cnv=readRDS( "cnv.rds" ) table(rownames(cnv$CNAmat )) a = cnv$prediction $copykat .pred table(a) scRNA$CopyKAT = a
CopyKAT結(jié)果可視化:正常細(xì)胞(diploid,藍(lán)色)還是腫瘤細(xì)胞(aneuploid,紅色)
p1 <- DimPlot(scRNA, group.by = "celltype" , label = T,reduction = 'tsne' ) p2 <- DimPlot(scRNA, group.by = "CopyKAT" ,reduction = 'tsne' ) + scale_color_manual(values = c("#F8766D" ,'#02BFC4' , "gray" )) pc <- p1 + p2 pc
可視化maker基因'TPPP3','KRT17'的表達(dá)情況,檢查CopyKAT結(jié)果準(zhǔn)確性:
cols = c("gray" , "coral2" ) plot <- FeaturePlot(scRNA, features = c('TPPP3' ,'KRT17' ),cols = cols, pt.size = 1,reduction = 'tsne' )+ theme(panel.border = element_rect(fill=NA,color="black" , size=1, linetype="solid" ))#加邊框 plot_grid(p2,plot)
CopyKAT結(jié)果主要有兩個(gè):
1)預(yù)測(cè)的結(jié)果正常細(xì)胞(diploid)還是腫瘤細(xì)胞(aneuploid);
2) 每個(gè)CNV segment在每個(gè)細(xì)胞的表達(dá)量。這里看出和inferCNV的不同來(lái)了,是基于genomic coordinate而不是gene level的表達(dá)量。
繪制熱圖 copykat.test = cnv pred.test <- data.frame(copykat.test$prediction ) CNA.test <- data.frame(copykat.test$CNAmat ) head(pred.test) head(CNA.test[,1:5]) my_palette <- colorRampPalette(rev(RColorBrewer::brewer.pal(n = 3, name = "RdBu" )))(n = 999) chr <- as.numeric(CNA.test$chrom ) %% 2+1 rbPal1 <- colorRampPalette(c('black' ,'grey' )) CHR <- rbPal1(2)[as.numeric(chr)] chr1 <- cbind(CHR,CHR) rbPal5 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2" )[2:1]) com.preN <- pred.test$copykat .pred pred <- rbPal5(2)[as.numeric(factor(com.preN))] cells <- rbind(pred,pred) col_breaks = c(seq(-1,-0.4,length=50),seq(-0.4,-0.2,length=150),seq(-0.2,0.2,length=600),seq(0.2,0.4,length=150),seq(0.4, 1,length=50)) heatmap.3(t(CNA.test[,4:ncol(CNA.test)]),dendrogram="r" , distfun = function (x) parallelDist::parDist(x,threads =4, method = "euclidean" ), hclustfun = function (x) hclust(x, method="ward.D2" ), ColSideColors=chr1,Colv=NA, Rowv=TRUE, notecol="black" ,col=my_palette,breaks=col_breaks, key=TRUE, keysize=1, density.info="none" , trace="none" , cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1, symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10)) legend("topright" , paste("pred." ,names(table(com.preN)),sep="" ), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2" )[2:1], cex=0.6, bty="n" )
再對(duì)腫瘤細(xì)胞再聚類(lèi)并畫(huà)熱圖,又能分成兩群。
table(pred.test$copykat .pred) tumor.cells <- pred.test$cell .names[which (pred.test$copykat .pred=="aneuploid" )] colnames(CNA.test) <- gsub(".1$" , "-1" , colnames(CNA.test)) tumor.mat <- CNA.test[, colnames(CNA.test) %in % tumor.cells] hcc <- hclust(parallelDist::parDist(t(tumor.mat),threads =4, method = "euclidean" ), method = "ward.D2" ) hc.umap <- cutree(hcc,2) rbPal6 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2" )[3:4]) subpop <- rbPal6(2)[as.numeric(factor(hc.umap))] cells <- rbind(subpop,subpop) heatmap.3(t(tumor.mat),dendrogram="r" , distfun = function (x) parallelDist::parDist(x,threads =4, method = "euclidean" ), hclustfun = function (x) hclust(x, method="ward.D2" ), ColSideColors=chr1,RowSideColors=cells,Colv=NA, Rowv=TRUE, notecol="black" ,col=my_palette,breaks=col_breaks, key=TRUE, keysize=1, density.info="none" , trace="none" , cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1, symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10)) legend("topright" , c("c1" ,"c2" ), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2" )[3:4], cex=0.9, bty='n' )
最后把CNV的結(jié)果投射到單細(xì)胞聚類(lèi)結(jié)果上看一看是否合理,Seurat標(biāo)準(zhǔn)流程走一遍,聚類(lèi)結(jié)果和copyKAT分群結(jié)果投射到TSNE上。
standard10X = function (dat,nPCs=50,res=1.0,verbose=FALSE){ srat = CreateSeuratObject(dat) srat = NormalizeData(srat,verbose=verbose) srat = ScaleData(srat,verbose=verbose) srat = FindVariableFeatures(srat,verbose=verbose) srat = RunPCA(srat,verbose=verbose) srat = RunTSNE(srat,dims=seq(nPCs),verbose=verbose) srat = FindNeighbors(srat,dims=seq(nPCs),verbose=verbose) srat = FindClusters(srat,res=res,verbose=verbose) return (srat) } GC1 <- standard10X(counts, nPCs=30, res=0.8) GC1$copykat .pred <- pred.test$copykat .pred GC1$copykat .tumor.pred <- rep("normal" , nrow(GC1@meta.data)) table(hc.umap) GC1$copykat .tumor.pred[rownames(GC1@meta.data) %in % names(hc.umap[hc.umap==1])] <- "tumor cluster 1" GC1$copykat .tumor.pred[rownames(GC1@meta.data) %in % names(hc.umap[hc.umap==2])] <- "tumor cluster 2" p1 <- DimPlot(GC1, label = T) p2 <- DimPlot(GC1, group.by = "copykat.pred" ) p3 <- DimPlot(GC1, group.by = "copykat.tumor.pred" ) p1 + p2 + p3
可以看到:2,4,8群是腫瘤細(xì)胞亞克隆
從免疫細(xì)胞和腫瘤細(xì)胞的標(biāo)記基因表達(dá)來(lái)看,copyKAT可以正確找出正常細(xì)胞和腫瘤細(xì)胞。
FeaturePlot(GC1,features=c("PTPRC" , "EPCAM" ), order = T)
結(jié)語(yǔ) 本期,我們使用CopyKAT分析評(píng)估上皮細(xì)胞良惡性。下一期,我們將對(duì)T細(xì)胞亞群進(jìn)行細(xì)分。順便提前預(yù)告一下,胃癌系列推文完成后,將開(kāi)啟肺腺癌單細(xì)胞數(shù)據(jù)集GSE189357復(fù)現(xiàn)系列,相關(guān)視頻已經(jīng)在B站上線:
文獻(xiàn)推文詳見(jiàn)(單細(xì)胞測(cè)序+空間轉(zhuǎn)錄組描繪從癌前病變到浸潤(rùn)性肺腺癌的動(dòng)態(tài)演變 )。此外,關(guān)于推文內(nèi)容的提升和優(yōu)化,歡迎大家提寶貴意見(jiàn)。謝謝!