【原】mNGS 01：所需數(shù)據(jù)庫準備

生信探索 2023-06-16 發(fā)布于云南

展開全文

centrifuge

用于物種分類

下載構(gòu)建好的索引

cd ~/DataHub/mNGS
wget https:///record/3732127/files/h%2Bp%2Bv%2Bc.tar.gz
md5sum h+p+v+c.tar.gz # 4d5124b0178118ab925d19f822d1b342
tar -xf h+p+v+c.tar.gz
# hpvc.1.cf  hpvc.2.cf  hpvc.3.cf  hpvc.4.cf

或自行構(gòu)建索引

cd ~/DataHub/mNGS
mamba activate mNGS
centrifuge-download -o taxonomy taxonomy
# taxonomy/nodes.dmp
# taxonomy/names.dmp
centrifuge-download -o library -m -d "archaea,bacteria,viral" refseq > seqid2taxid.map
cat library/*/*.fna > input-sequences.fna

## build centrifuge index with 20 threads
centrifuge-build -p 20 \
  --conversion-table seqid2taxid.map \
  --taxonomy-tree taxonomy/nodes.dmp \
  --name-table taxonomy/names.dmp \
  input-sequences.fna abv

kraken2/bracken

用于物種分類，使用standard-16版的數(shù)據(jù)庫

https://benlangmead./aws-indexes/k2

setA 的庫為VFDB數(shù)據(jù)庫核心庫(set A)，而setB為全庫(setB), 其中setA僅包含經(jīng)實驗驗證過的毒力基因，而setB則在setA的基礎(chǔ)上增加了預測的毒力基因

cd ~/DataHub/mNGS
wget http://www./VFs/Down/VFDB_setB_nt.fas.gz
# 構(gòu)建數(shù)據(jù)庫
diamond makedb --in VFDB_setB_nt.fas.gz --db VFDB_setB_nt.fas --ignore-warnings
# 生成的文件為 VFDB_setB_nt.fas.dmnd
rm VFDB_setB_nt.fas.gz

pavian

pavian是一個shiny網(wǎng)頁工具，可以可視化kraken2或centrifuge的報告

channels:
  - conda-forge
  - bioconda
  - nodefaults
dependencies:
  - r-base=4.0.5
  - r-pavian=1.2.0
  - bioconductor-rsamtools=2.6.0
  - r-shiny=1.7.2

最后的文件目錄

/home/victor/DataHub/mNGS
├── database100mers.kmer_distrib
├── database150mers.kmer_distrib
├── database200mers.kmer_distrib
├── database250mers.kmer_distrib
├── database300mers.kmer_distrib
├── database50mers.kmer_distrib
├── database75mers.kmer_distrib
├── hash.k2d
├── hpvc.1.cf
├── hpvc.2.cf
├── hpvc.3.cf
├── hpvc.4.cf
├── inspect.txt
├── ktaxonomy.tsv
├── opts.k2d
├── seqid2taxid.map
├── taxo.k2d
└── VFDB_setB_nt.fas.dmnd

Reference

# centrifuge
https://github.com/DaehwanKimLab/centrifuge
https:///bioconda/centrifuge
https://ccb./software/centrifuge/manual.shtml#obtaining-centrifuge
https://mp.weixin.qq.com/s/LVK2xZqGMRBP0fYGlkjCxg


#【mNGS病原檢測 —— mNGS技術(shù)的現(xiàn)狀和未來  王珺】 
https://www.bilibili.com/video/BV1Ea411s7uc

# kraken2
https://mp.weixin.qq.com/s/gu7FsYGSqSRKm8VH-403BA
# 宏基因組單個樣本數(shù)據(jù)處理流程筆記
https://www./cs107077993/

#VFDB
https://blog.csdn.net/yangl7/article/details/114956228