作為三大高級分析的SCENIC,由于在R中運行的時間太長了,因此我們這里直接用python運行,時間大概能夠縮減20倍,這里我們用自己的數(shù)據(jù)進(jìn)行跑流程:
官網(wǎng)地址:GitHub - aertslab/pySCENIC: pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
一部分在Linux命令行中運行,另一部分在jupyter notebook中運行:
首先是Linux中運行:
#!/bin/bash
loomfile='/home/data/sbc057/B.loom'
tfs='/home/data/refdir/lesson_3-4/SCENIC_RESOURCES/allTFs_hg38.txt'
tbl='/home/data/refdir/lesson_3-4/SCENIC_RESOURCES/hgnc_v2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl'
motifs='/home/data/refdir/lesson_3-4/SCENIC_RESOURCES/hgnc_v2/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
outdir='pySCENIC_B_res'
num_workers=1
if [ -e $outdir ] && [ -d $outdir ]
then
echo $outdir 'already exists.'
else
mkdir $outdir
fi
cd $outdir
##################################################
# NOTE !!!
# Using v1 feather would raise
# ValueError: 'hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather'
# is a cisTarget Feather database in Feather v1 format, which is not supported anymore.
# Convert them with 'convert_cistarget_databases_v1_to_v2.py' (https://github.com/aertslab/create_cisTarget_databases/) to Feather v2 format.
##################################################
# STEP 1: Gene regulatory network inference, and generation of co-expression modules
pyscenic grn $loomfile $tfs -o adj.csv -m grnboost2 --num_workers $num_workers --seed 777
# STEP 2-3: Regulon prediction aka cisTarget from CLI
pyscenic ctx adj.csv $motifs -o reg.csv \
--annotations_fname $tbl \
--rank_threshold 5000 --auc_threshold 0.05 --nes_threshold 3.0 \
--thresholds 0.75 0.90 --top_n_targets 50 --top_n_regulators 5 10 50 --min_genes 20 \
--expression_mtx_fname $loomfile \
--mask_dropouts --num_workers $num_workers
# STEP 4: Cellular enrichment (aka AUCell) from CLI
pyscenic aucell $loomfile reg.csv \
-o pyscenic_final.loom --num_workers 1
這里由于pandas版本問題,會出現(xiàn)一個經(jīng)典報錯,出現(xiàn)的第4步AUCell,出現(xiàn):AttributeError: 'Series' object has no attribute 'iteritems'
查閱了資料,最終的解決方案是降低pandas版本,這里我直接新建了一個新環(huán)境,并安裝特定包版本:
conda create -y -n pyscenic_env python=3.10
conda activate pyscenic_env
pip install pyscenic
pip install numpy==1.23.5
pip install pandas==1.5.3
pip install numba==0.56.4
最終我們得到了所需文件pyscenic_final.loom
接下來在python中運行:
import os
import numpy as np
import pandas as pd
import scanpy as sc
import loompy as lp
import json
import base64
import zlib
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.stats import zscore
from adjustText import adjust_text
from pyscenic.export import add_scenic_metadata
from pyscenic.cli.utils import load_signatures
from pyscenic.rss import regulon_specificity_scores
from pyscenic.plotting import plot_rss
from pyscenic.binarization import binarize
sc.settings.verbosity = 3
#########################################
wdir = 'pySCENIC_new'
os.mkdir( wdir )
os.chdir( wdir )
f_final_loom = '/home/data/sbc057/pyscenic_final.loom'
regulonsfile = '/home/data/sbc057/reg.csv'
lf = lp.connect( f_final_loom, mode='r', validate=False )
meta = json.loads(zlib.decompress(base64.b64decode( lf.attrs.MetaData )))
exprMat = pd.DataFrame( lf[:,:], index=lf.ra.Gene, columns=lf.ca.CellID).T
auc_mtx = pd.DataFrame( lf.ca.RegulonsAUC, index=lf.ca.CellID)
lf.close()
這段代碼的目的是從 Loom 文件中讀取基因表達(dá)數(shù)據(jù)和 Regulons 的 AUC 值,然后將其轉(zhuǎn)換成 Pandas DataFrame 進(jìn)行進(jìn)一步的分析和處理。
adata = sc.read( f_final_loom, validate=False)
adata.obs.drop( ['RegulonsAUC'], axis=1, inplace=True )
sc._utils.sanitize_anndata( adata )
sig = load_signatures(regulonsfile)
adata = add_scenic_metadata(adata, auc_mtx, sig)
然后新建一個AnnData對象,由于自己數(shù)據(jù)中暫時沒有分組,這里我們隨機(jī)分成2組,'Group1' 和 'Group2',只是為了復(fù)現(xiàn)流程:
adata.obs
adata.obs['celltype'] = np.random.choice(['Group1', 'Group2'], size=len(adata.obs))
print(adata.obs[['celltype']].head())
celltype
CellID
AAACATTGAGCTAC Group1
AAACTTGAAAAACG Group1
AAAGGCCTGTCTAG Group1
AAAGTTTGATCACG Group1
AAAGTTTGGGGTGA Group2
這些代碼的目的是從外部文件中加載細(xì)胞注釋信息,將其整合到 AnnData 對象的觀測中,并為細(xì)胞類型創(chuàng)建一個新的列。接下來,計算分組之間的AUC分?jǐn)?shù)。
# Regulon Specificity Score
rss_cellType = regulon_specificity_scores( auc_mtx, adata.obs['celltype'] )
# binary AUC
binary_mtx, auc_thresholds = binarize( auc_mtx )
# save results
auc_mtx.to_excel('AUC_Score_Matrix.random.xlsx', header=True, index=True)
rss_cellType.to_excel('RSS_Score_by_CellType_Matrix.random.xlsx', header=True, index=True)
binary_mtx.to_excel('Binarized_AUC_Score_Matrix.random.xlsx', header=True, index=True)
可視化:
cats = np.unique(adata.obs.celltype)
#########################################
#### plot
#########################################
# RSS
fig = plt.figure(figsize=(15, 6))
for c,num in zip(cats, range(1,len(cats) 1)):
x=rss_cellType.T[c]
ax = fig.add_subplot(3, 5, num)
plot_rss(rss_cellType, c, top_n=5, max_n=None, ax=ax)
ax.set_ylim( x.min()-(x.max()-x.min())*0.05 , x.max() (x.max()-x.min())*0.05 )
for t in ax.texts:
t.set_fontsize(12)
ax.set_ylabel('')
ax.set_xlabel('')
adjust_text(ax.texts, autoalign='xy', ha='right', va='bottom', arrowprops=dict(arrowstyle='-',color='lightgrey'), precision=0.001)
fig.text(0.5, 0.0, 'Regulon', ha='center', va='center', size='x-large')
fig.text(0.00, 0.5, 'Regulon specificity score (RSS)', ha='center', va='center', rotation='vertical', size='x-large')
plt.tight_layout()
plt.rcParams.update({
'figure.autolayout': True,
'figure.titlesize': 'large' ,
'axes.labelsize': 'medium',
'axes.titlesize':'large',
'xtick.labelsize':'medium',
'ytick.labelsize':'medium'
})
plt.savefig('All_RssPlot.pdf', dpi=300, bbox_inches = 'tight')
上圖是Regulon的特異性排序圖,橫坐標(biāo)表示排名,縱坐標(biāo)表示RSS特異性得分。RSS越高的調(diào)控子可能與該細(xì)胞類型特異性相關(guān)。
# AUC
topreg = []
for i,c in enumerate(cats):
topreg.extend(
list(rss_cellType.T[c].sort_values(ascending=False)[:5].index)
)
topreg = list(set(topreg))
auc_mtx_Z = zscore(auc_mtx, axis=0, ddof=0)
colors = sns.color_palette('bright',n_colors=len(cats) )
colorsd = dict( zip( cats, colors ))
colormap = [ colorsd[x] for x in adata.obs['celltype'] ]
sns.set(font_scale=1.2)
g_auc = sns.clustermap(auc_mtx_Z[topreg], annot=False, square=False, linecolor='gray',
yticklabels=False, xticklabels=True, vmin=-2, vmax=6, row_colors=colormap,
cmap='YlGnBu', figsize=(12,18) )
g_auc.cax.set_visible(True)
g_auc.ax_heatmap.set_ylabel('')
g_auc.ax_heatmap.set_xlabel('')
g_auc.fig.savefig('RegulonActivity_heatmap.pdf', dpi=300, bbox_inches = 'tight', pad_inches=1)
上圖是行為細(xì)胞分組,列為轉(zhuǎn)錄因子的熱圖,不過需要一些調(diào)整才能更方便看出差異。
# bionarized AUC
g_auc_b = sns.clustermap(binary_mtx[topreg].T, annot=False, square=False,
col_colors=colormap, cmap=sns.xkcd_palette(['white', 'black']), figsize=(12,18) )
g_auc_b.cax.set_visible(True)
g_auc_b.ax_heatmap.set_xticklabels([])
g_auc_b.ax_heatmap.set_xticks([])
g_auc_b.ax_heatmap.set_xlabel('Cells')
g_auc_b.ax_heatmap.set_ylabel('Regulons')
g_auc_b.ax_col_colors.set_yticks([0.5])
g_auc_b.ax_col_colors.set_yticklabels(['Cell Type'])
g_auc_b.fig.savefig('RegulonActivity_heatmap_binary.pdf', dpi=300, bbox_inches = 'tight', pad_inches=1)
上面是AUC二分類的熱圖。
df_obs = adata.obs
signature_column_names = list(df_obs.select_dtypes('number').columns)
signature_column_names = list(filter(lambda s: s.startswith('Regulon('), signature_column_names))
df_scores = df_obs[signature_column_names ['celltype']]
df_results = ((df_scores.groupby(by='celltype').mean() - df_obs[signature_column_names].mean())/ df_obs[signature_column_names].std()).stack().reset_index().rename(columns={'level_1': 'regulon', 0:'Z'})
df_results['regulon'] = list(map(lambda s: s[8:-1], df_results.regulon))
df_heatmap = pd.pivot_table(data=df_results[df_results.Z >= 0].sort_values('Z', ascending=False),
index='celltype', columns='regulon', values='Z')
df_heatmap = df_heatmap.fillna(0) # 將 NaN 填充為 0,但根據(jù)實際情況可以使用其他值
# 繪制熱圖
fig, ax1 = plt.subplots(1, 1, figsize=(15, 8))
sns.heatmap(df_heatmap, ax=ax1, annot=True, fmt='.1f', linewidths=.7, cbar=False, square=True, linecolor='gray',
cmap='YlGnBu', annot_kws={'size': 6})
ax1.set_ylabel('')
fig.savefig('Average_Regulon_Activity_byCelltype.pdf')
上圖展示了不同分組的特異性轉(zhuǎn)錄因子,因此能夠了解不同分組的轉(zhuǎn)錄翻譯過程,結(jié)合其他功能分析才能更細(xì)致地研究細(xì)胞亞群功能。