?詳情請聯(lián)系作者: ? 單細胞ATAC的數(shù)據(jù)分析內容我們已經開始了,上游分析參見:ArchR包單細胞ATAC分析(1): 上游分析。scATAC分析完整版內容以及其中個性作圖和數(shù)據(jù)操作內容已全部在微信VIP群發(fā)布,請自行下載!從這節(jié)開始,我們正式開始下游的數(shù)據(jù)分析,用的是ArchR包。為啥用ArchR包進行分析?其實用于單細胞ATAC分析的包和方法有很多,但是
沒必要每一個都學習掌握,掌握一兩個就足夠了。為什么選擇ArchR,我們在上游分析中講過了,也附上了文章的鏈接
(https://www./articles/s41588-021-00790-6),它的確有一些優(yōu)勢,感興趣的可以看看原文。接下來我們就正式開始單細胞ATAC分析之旅。這里我們使用的示例數(shù)據(jù)是人的單細胞ATAC數(shù)據(jù)。第一步:首先安裝需要的R包,第一個當然是ArchR了:
devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories()) library(ArchR) ArchR::installExtraPackages()#安裝一些默認情況下沒有安裝的依賴包 #加載相關R包 library(pheatmap) library(Rsamtools) library(scran) library(scater) library(dplyr) library(Seurat) library(patchwork) library(SingleCellExperiment) library(ComplexHeatmap) library(ggplot2) library(stringr) BiocManager::install('EnsDb.Hsapiens.v86') library(EnsDb.Hsapiens.v86) library(viridis)
addArchRGenome("hg19")#設置基因組(人hg19),根據(jù)自己實際情況 # addArchRGenome("hg38")#設置基因組(人hg38) # addArchRGenome("mm10")#小鼠 addArchRThreads(threads = 10)#設置線程、根據(jù)自己電腦實際情況 第二步:創(chuàng)建Arrow files:ArchR的input文件只需要fragments.tsv.gz!setwd('/home/data_analysis/scATAC/scATAC/ATAC_analysis/')#設置文件路徑 #input文件路徑,只需要樣本的atac_fragments.tsv.gz文件 input.file.list = c('/home/data_analysis/scATAC/scATAC/Y6/Y6_fragments.tsv.gz', '/home/data_analysis/scATAC/scATAC/Y16/Y16_fragments.tsv.gz', '/home/data_analysis/scATAC/scATAC/Y25/Y25_fragments.tsv.gz') #設置樣本名,相當于seurat中orig.ident。 sampleNames = c("Y6","Y16","Y25")
#創(chuàng)建Arrow文件 ArrowFiles <- createArrowFiles(inputFiles = input.file.list, sampleNames = sampleNames, minTSS = 4, #默認值這里有過濾,自行設置,不宜太高 minFrags = 1000, addTileMat = TRUE, addGeneScoreMat = TRUE, excludeChr = c("chrM", "chrY", "chrX")) 第三步:和seurat一樣我們需要去除雙細胞,計算雙細胞評分:doubScores <- addDoubletScores(input = ArrowFiles, k = 20, #Refers to how many cells near a "pseudo-doublet" to count. knnMethod = "UMAP", useMatrix = "TileMatrix", nTrials=5, LSIMethod = 1, scaleDims = F, corCutOff = 0.75, UMAPParams = list(n_neighbors =30, min_dist = 0.3, metric = "cosine", verbose =T)) 第四步:構建ArchRProject,這個就類似于seurat對象,是后續(xù)分析的基礎。#構建ArchRProject projncov <- ArchRProject(ArrowFiles = ArrowFiles, outputDirectory = "ATAC_out", #結果存儲文件夾 copyArrows = TRUE) #建議使用T,保存arrows副本
table(projncov@cellColData$Sample)#查看每個樣本細胞數(shù) # Y16 Y25 Y6 # 7683 12382 3109
#過濾雙細胞------------------------------------------------------------ proj.filter <- filterDoublets(projncov) # Filtering 2219 cells from ArchRProject! # Y16 : 590 of 7683 (7.7%) # Y25 : 1533 of 12382 (12.4%) # Y6 : 96 of 3109 (3.1%) table(proj.filter@cellColData$Sample)#查看每個樣本細胞數(shù) # Y16 Y25 Y6 # 7093 10849 3013 這樣我們的數(shù)據(jù)構建和質控就完成了,當然了,質控是需要自己調整參數(shù)的,所以這個過程可能需要反復的修改,請自行參考幫助函數(shù)!完成后,也會自動在plot文件夾中出現(xiàn)質控圖:完成質控后,我們也可以進行作圖展示質控結果:
p5 <- plotFragmentSizes(ArchRProj = proj.filter)+ ggtitle("Fragment Size Histogram")
p6 <- plotTSSEnrichment(ArchRProj = proj.filter)+ ggtitle("TSS Enrichment")
plotPDF(p5,p6, name = "QC-Sample-FragSizes-TSSProfile_filter.pdf", ArchRProj = proj.filter, addDOC = FALSE, width = 5, height = 5)
###密度圖 filterSample_cellNum <- table(proj.filter$Sample)#過濾后樣本數(shù) sampleplot_list <- list() for (i in 1:length(sampleNames)) { proj.i <- proj.filter[proj.filter$Sample == sampleNames[i]]#提取每個樣本ArchRproject做密度圖 p <- ggPoint( x = log10(proj.i$nFrags), y = proj.i$TSSEnrichment, colorDensity = TRUE, continuousSet = "sambaNight", xlabel = "Log10(Unique Fragments)", ylabel = "TSS Enrichment")+ geom_hline(yintercept = 4, lty = "dashed") + geom_vline(xintercept = 3, lty = "dashed")+ ggtitle(paste0(sampleNames[i],"\n","Cells Pass Filter =",filterSample_cellNum[[i]])) sampleplot_list[[i]] <- p }
plotPDF(sampleplot_list[[1]], sampleplot_list[[2]], sampleplot_list[[3]], name = "QC-Sample-TSSenrich-Frag_filter.pdf", ArchRProj = proj.filter, addDOC = FALSE, width = 5, height = 5) 好了,這就是今天的內容了,覺得分享有用的點個贊再走唄!
|