HiC數(shù)據(jù)分析實(shí)戰(zhàn)（一）

微笑如酒 2018-07-26

展開全文

首先需要明白數(shù)據(jù)分析流程，可以查看第一講：三維基因組學(xué)習(xí)筆記，提煉流程如下：

Hi-C標(biāo)準(zhǔn)分析流程(比對(duì)及過(guò)濾，原始互作圖譜構(gòu)建)

下載參考基因組及構(gòu)建bowtie2索引
把fq測(cè)序數(shù)據(jù)比對(duì)都參考基因組
過(guò)濾及挑選符合要求的比對(duì)結(jié)果
原始互作圖譜構(gòu)建
互作圖譜迭代校正

Compartment分析
TAD分析
顯著互作Loops分析

實(shí)戰(zhàn)數(shù)據(jù)

來(lái)自于Tung B. K. Le et al. Science 2013 ：https://www.ncbi.nlm./sra/?term=srr824846

Study: High-resolution mapping of the spatial organization of Caulobacter crescentus chromosome by chromosome conformation capture in conjunction with next-generation sequencing (Hi-C)

數(shù)據(jù)下載后轉(zhuǎn)為fq文件如下：

858M Jul  3 16:21 SRR824846_Q20L10_1.fastq.gz
857M Jul  3 16:22 SRR824846_Q20L10_2.fastq.gz

如果想看其它數(shù)據(jù)：PRJNA196826 · SRP020913 · All experiments · All runs

下載參考基因組并且構(gòu)建bowtie2的索引

物種是：新月柄桿菌 Caulobacter crescentus，它是一種經(jīng)常用于實(shí)驗(yàn)室實(shí)驗(yàn)中的細(xì)菌，通常含有扁平囊泡（綠色），包裹著貯存顆粒（橙色）。

WC Nierman - ?2001的文章就發(fā)表了該物種的基因組 - ?被引用次數(shù)：500 The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes.

mkdir -p ~/project/hic/ref
cd ~/project/hic/ref
wget ftp://ftp.ensemblgenomes.org/pub/bacteria/release-40/fasta/bacteria_20_collection/caulobacter_crescentus_na1000/dna/Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa.gz
gunzip Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa.gz
bowtie2-build  Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa   bacteria

得到

5.3M Jul 25 19:28 bacteria.1.bt2
988K Jul 25 19:28 bacteria.2.bt2
  17 Jul 25 19:28 bacteria.3.bt2
988K Jul 25 19:28 bacteria.4.bt2
5.3M Jul 25 19:28 bacteria.rev.1.bt2
988K Jul 25 19:28 bacteria.rev.2.bt2

這個(gè)參考基因組fa文件節(jié)選如下：

>Chromosome dna:chromosome chromosome:ASM2200v1:Chromosome:1:4042929:1 REF
GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTG
GACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAA
TGCGACCGTTGATCACAGCGGCGCCGTGTCACGACGCTGTTGGGGCCGTTCGGCGCCCGG

下載必備軟件

軟件大全來(lái)源于：https:///3c-4c-5c-hi-c-chia-pet-category

如果沒(méi)有conda就先安裝咯：

wget https://mirrors.tuna./anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
conda config --add channels https://mirrors.tuna./anaconda/pkgs/free
conda config --add channels https://mirrors.tuna./anaconda/cloud/conda-forge
conda config --add channels https://mirrors.tuna./anaconda/cloud/bioconda
conda config --set show_channel_urls yes

然后安裝一系列軟件哈：

conda create -n hic  python=2 bowtie2
conda info --envs
source activate hic
conda search hiclab
conda install -y sra-tools samtools

有些軟件不在conda里面，需要自行查看軟件說(shuō)明書文檔，主要是：

https:///mirnylab/hiclib
https://github.com/nservant/HiC-Pro

其中特別值得推薦，可以處理各種各樣的hic數(shù)據(jù)，包括：

Hi-C
in situ Hi-C
DNase Hi-C
Micro-C
capture-C
capture Hi-C
HiChip

安裝hiclib代碼如下：

source activate hic
conda install numpy scipy matplotlib h5py cython numexpr statsmodels  scikit-learn pandas 
pip install https:///mirnylab/mirnylib/get/tip.tar.gz
pip install https:///mirnylab/hiclib/get/tip.tar.gz ##  17.7MB 44kB/s

安裝hiclib代碼如下：

# conda install numpy scipy matplotlib h5py cython numexpr statsmodels  scikit-learn pandas
## 依賴軟件比較多
source activate hic
conda install -y pysam bx-python numpy scipy 
conda install  -y R  

R -e 'install.packages(c('ggplot2','RColorBrewer') repos='https://mirrors.tuna./CRAN/')'
R -e 'library(ggplot2)'
R -e 'library(RColorBrewer)'

mkdir -p ~/biosoft/hicpro
cd ~/biosoft/hicpro
git clone https://github.com/nservant/HiC-Pro.git
cd HiC-Pro/
which bowtie2
which R
which samtools
which python
cat config-install.txt
mkdir /home/zengjianming/biosoft/hicpro/bin

這個(gè)時(shí)候一定要根據(jù)自己的系統(tǒng)環(huán)境，修改目錄下的config-install.txt文件哦：

PREFIX =/home/zengjianming/biosoft/hicpro/bin
BOWTIE2_PATH =/home/zengjianming/miniconda3/envs/hic/bin/bowtie2
SAMTOOLS_PATH =/home/zengjianming/miniconda3/envs/hic/bin/samtools
R_PATH =/home/zengjianming/miniconda3/envs/hic/bin/R
PYTHON_PATH =/home/zengjianming/miniconda3/envs/hic/bin/python
CLUSTER_SYS =SGE

然后就可以編譯自己的軟件啦：

make configure
make install

依賴非常多，但是用心安裝還是問(wèn)題不大的哦！

/home/zengjianming/biosoft/hicpro/bin/HiC-Pro_2.10.0/bin/HiC-Pro -h

這樣如果輸出了幫助文檔，說(shuō)明安裝成功哦。

hiclib教程

先看官網(wǎng)readme，如下:

0. Download software and data
1. Map reads to the genome
2. Filter the dataset at the restriction fragment level
3. Filter and iteratively correct heatmaps.

打開才發(fā)現(xiàn)，居然清一色的python代碼，而不是打包好的軟件，命令行加上參數(shù)的模式來(lái)走這個(gè)流程，感覺有點(diǎn)難用，先放棄，后續(xù)再更新這個(gè)使用記錄。

Hic-pro教程

其說(shuō)明書完全不遜于hiclib，詳見：http://nservant./HiC-Pro

大體上看就6個(gè)步驟，比對(duì)、過(guò)濾HiC比對(duì)結(jié)果、檢測(cè)有效HiC序列、結(jié)果合并、構(gòu)建HiC關(guān)聯(lián)圖譜以及關(guān)聯(lián)圖譜標(biāo)準(zhǔn)化。而行使這些不同功能只需要更改參數(shù)即可：

 [-s|--step ANALYSIS_STEP] : run only a subset of the HiC-Pro workflow; if not specified the complete workflow is run
      mapping: perform reads alignment - require fast files
      proc_hic: perform Hi-C filtering - require BAM files
      quality_checks: run Hi-C quality control plots
      merge_persample: merge multiple inputs and remove duplicates if specified - require .validPairs files
      build_contact_maps: Build raw inter/intrachromosomal contact maps - require _allValidPairs files
      ice_norm : run ICE normalization on contact maps - require .matrix files

只使用s 參數(shù)才會(huì)分步運(yùn)行，因?yàn)?步中還是mapping花的時(shí)間最多，如果其它步驟需要調(diào)整參數(shù)，分步運(yùn)行還是會(huì)快很多，比如調(diào)整BIN_SIZE等等。

當(dāng)然，不得不提的是其特色功能：位基因特異性HiC分析

今天有點(diǎn)晚了，明天繼續(xù)實(shí)戰(zhàn)哦。

其它實(shí)戰(zhàn)數(shù)據(jù)集

上面的是細(xì)菌基因組，測(cè)序文件也小很多，適合練手，如果熟練了也可以找其它數(shù)據(jù)集，比如Rose基因組的HiC原始數(shù)據(jù)下載地址：

http://sra-download.ncbi.nlm./srapub/SRR6189546
http://sra-download.ncbi.nlm./srapub/SRR6189547

每個(gè)數(shù)據(jù)都12G左右。

還可以是 An Osteoporosis Risk SNP at 1p36.12 Acts as an Allele-Specific Enhancer to Modulate LINC00339 Expression via Long-Range Loop Formation 文章的數(shù)據(jù)，等等。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：微笑如酒 > 《測(cè)序》

舉報(bào)/認(rèn)領(lǐng)