centrifuge用于物種分類 cd ~/DataHub/mNGS wget https:///record/3732127/files/h%2Bp%2Bv%2Bc.tar.gz md5sum h+p+v+c.tar.gz # 4d5124b0178118ab925d19f822d1b342 tar -xf h+p+v+c.tar.gz # hpvc.1.cf hpvc.2.cf hpvc.3.cf hpvc.4.cf
cd ~/DataHub/mNGS mamba activate mNGS centrifuge-download -o taxonomy taxonomy # taxonomy/nodes.dmp # taxonomy/names.dmp centrifuge-download -o library -m -d "archaea,bacteria,viral" refseq > seqid2taxid.map cat library/*/*.fna > input-sequences.fna
## build centrifuge index with 20 threads centrifuge-build -p 20 \ --conversion-table seqid2taxid.map \ --taxonomy-tree taxonomy/nodes.dmp \ --name-table taxonomy/names.dmp \ input-sequences.fna abv
kraken2/bracken用于物種分類,使用standard-16版的數(shù)據(jù)庫 https://benlangmead./aws-indexes/k2 setA 的庫為VFDB數(shù)據(jù)庫核心庫(set A),而setB為全庫(setB), 其中setA僅包含經(jīng)實驗驗證過的毒力基因,而setB則在setA的基礎(chǔ)上增加了預測的毒力基因 cd ~/DataHub/mNGS wget http://www./VFs/Down/VFDB_setB_nt.fas.gz # 構(gòu)建數(shù)據(jù)庫 diamond makedb --in VFDB_setB_nt.fas.gz --db VFDB_setB_nt.fas --ignore-warnings # 生成的文件為 VFDB_setB_nt.fas.dmnd rm VFDB_setB_nt.fas.gz
pavianpavian是一個shiny網(wǎng)頁工具,可以可視化kraken2或centrifuge的報告 channels: - conda-forge - bioconda - nodefaults dependencies: - r-base=4.0.5 - r-pavian=1.2.0 - bioconductor-rsamtools=2.6.0 - r-shiny=1.7.2
最后的文件目錄/home/victor/DataHub/mNGS ├── database100mers.kmer_distrib ├── database150mers.kmer_distrib ├── database200mers.kmer_distrib ├── database250mers.kmer_distrib ├── database300mers.kmer_distrib ├── database50mers.kmer_distrib ├── database75mers.kmer_distrib ├── hash.k2d ├── hpvc.1.cf ├── hpvc.2.cf ├── hpvc.3.cf ├── hpvc.4.cf ├── inspect.txt ├── ktaxonomy.tsv ├── opts.k2d ├── seqid2taxid.map ├── taxo.k2d └── VFDB_setB_nt.fas.dmnd
Reference# centrifuge https://github.com/DaehwanKimLab/centrifuge https:///bioconda/centrifuge https://ccb./software/centrifuge/manual.shtml#obtaining-centrifuge https://mp.weixin.qq.com/s/LVK2xZqGMRBP0fYGlkjCxg
#【mNGS病原檢測 —— mNGS技術(shù)的現(xiàn)狀和未來 王珺】 https://www.bilibili.com/video/BV1Ea411s7uc
# kraken2 https://mp.weixin.qq.com/s/gu7FsYGSqSRKm8VH-403BA # 宏基因組單個樣本數(shù)據(jù)處理流程筆記 https://www./cs107077993/
#VFDB https://blog.csdn.net/yangl7/article/details/114956228
|