當(dāng)你的腫瘤高通量測序樣本越多,或沒有病人自身配對的癌旁與血液等非體細(xì)胞突變受累組織的配對時,那么WES或WGS所獲得的體細(xì)胞突變數(shù)量將會非常多,所涉及的基因也很多,以至于突變Maf圖無法閱讀 (基因太多、圖太大)。類似這樣: R包 maftools繪制的100+腫瘤樣本突變圖景 你所關(guān)注的腫瘤驅(qū)動突變就在這個大圖里。但怎么過濾、篩選,是腫瘤測序分析的另一個難點(diǎn),甚至是核心內(nèi)容了。第1個難點(diǎn)在于GATK變異檢測后的質(zhì)控與過濾。變異注釋、Maf格式整理也有需要注意的地方。 進(jìn)一步篩選的原則是:做一些初步的、逐步的、不丟失核心信息的篩選,使圖不至于過大 (可以很大,但不能過大;也不能一下就變得很小,否則會丟失重要的、可能的腫瘤驅(qū)動基因)。 這套篩選的邏輯與原則,跟轉(zhuǎn)錄組中首先依賴 Fold Change 和 P值 等指標(biāo)篩選差異基因不同。腫瘤基因組分析似乎是另一整套篩選邏輯,核心是看變異的功能效應(yīng),所謂的“腫瘤驅(qū)動”也屬于這個范圍。 當(dāng)然,腫瘤高通量測序的QC (質(zhì)量控制)部分的篩選也很重要。其任務(wù)是:相對準(zhǔn)確地挑出真實(shí)的體細(xì)胞突變,而不理會突變的功能影響。由于腫瘤自身突變的異質(zhì)性特點(diǎn),以及取樣純度的客觀因素限制,驅(qū)動突變的負(fù)荷很低 (0.5%~10%),甚至與測序錯誤相接近,因此這里的篩選 (挑出真實(shí)的體細(xì)胞突變,完美過濾掉遺傳變異)在統(tǒng)計(jì)學(xué)上是極難的,甚至是不可能完美解決的。 但如本節(jié)開頭所述,即使都是體細(xì)胞突變,所涉及的基因個數(shù)也是非常多、且雜亂的,需要進(jìn)一步結(jié)合“突變功能效應(yīng)” (比如:同義、錯義突變,剪接位點(diǎn),內(nèi)含子、基因間,藥物響應(yīng)、耐藥性等),做更加深入、核心的篩選。假設(shè)你只關(guān)注EGFR的各種突變,那問題就非常簡單了,可在當(dāng)前最大的那個Maf圖里 (即使圖太大、無法閱讀),只需取出EGFR基因即可。 腫瘤突變功能篩選的幾個可選技術(shù)路徑 ① 疾病名稱或關(guān)鍵詞 這個思路是為了看:在已經(jīng)報(bào)道的該癌種的眾多突變 (要盡量收集全),是否在當(dāng)前的腫瘤研究項(xiàng)目中出現(xiàn)。 如腦膠質(zhì)瘤的疾病關(guān)鍵詞:Glioma,Glioblastoma,Low grade glioma,High grade glioma,Diffuse Astrocytoma (看起來Glioma可能囊括其它的,故可先選用之)。為什么我知道這么多疾病的別名?搜一下MalaCards。 ② 匯總該疾病已知的基因檢測Panel 這個最容易實(shí)現(xiàn),雖然基因數(shù)量最少,但最確定。http://www./content/index/classid/11/id/22 (真固生物) 是一種細(xì)胞增殖相關(guān)的核抗原,作為判定增殖細(xì)胞數(shù)比例的指標(biāo),Ki-67蛋白存在于其染色陽性說明癌細(xì)胞增殖活躍,陽性標(biāo)記指數(shù)越高,則惡性程度越高,預(yù)后越差,可通過免疫組化來檢測。 異檸檬酸脫氫酶(IDH)突變在原發(fā)性膠質(zhì)母細(xì)胞瘤中發(fā)生率很低,有突變的患者治療效果和預(yù)后更好。可以用來鑒別膠質(zhì)瘤和膠質(zhì)細(xì)胞增生,有無IDH1/2基因突變作為評估低級別膠質(zhì)瘤患者風(fēng)險(xiǎn)級別的指標(biāo)之一。 MGMT基因啟動子CpG島甲基化在判斷腦膠質(zhì)瘤患者預(yù)后及預(yù)測腫瘤對烷化劑藥物耐藥性方面具有重要意義。具有MGMT啟動子甲基化的膠質(zhì)瘤患者對放療和化療更敏感,并具有更長的生存期。 染色體1p/19q聯(lián)合性缺失是指1號染色體短臂和19號染色體長臂同時缺失,目前認(rèn)為1p/19q聯(lián)合性缺失是少突膠質(zhì)細(xì)胞瘤的分子特征,是其診斷性分子標(biāo)志物。對于有1p/19q聯(lián)合缺失的少突或間變少突膠質(zhì)細(xì)胞瘤患者,推薦化療或聯(lián)合放化療。1p/19q聯(lián)合缺失的膠質(zhì)瘤患者總生存期和無進(jìn)展生存期較長。 端粒酶逆轉(zhuǎn)錄酶(TelomeraseReverseTranscriptase,TERT)是端粒酶復(fù)合物的催化中心,最新研究表明:只攜帶TERT突變的III-IV級膠質(zhì)瘤患者多為原發(fā)性膠母細(xì)胞瘤,且預(yù)后不良;只攜帶IDH1/2突變的III-IV級膠質(zhì)瘤患者多呈現(xiàn)星形細(xì)胞形態(tài);同時攜帶TERT和IDH1/2突變的膠質(zhì)瘤患者多呈現(xiàn)少突膠質(zhì)細(xì)胞形態(tài),預(yù)后良好。 BRAF基因位于7q34,編碼一種絲/蘇氨酸特異性激酶,近年臨床試驗(yàn)表明,維莫非尼(vemurafenib)在兒童膠質(zhì)母細(xì)胞瘤、毛細(xì)胞黏液樣星形細(xì)胞瘤、復(fù)發(fā)多形性星形細(xì)胞瘤等的治療中均也取得了較好的療效,提示BRAF突變的患者可選取維莫非尼可作為潛在靶向的治藥物。 TP53為抑癌基因,定位于染色體17p13.1,編碼蛋白稱為p53蛋白,p53蛋白能調(diào)節(jié)細(xì)胞周期和避免細(xì)胞癌變發(fā)生,超過50%的人類腫瘤涉及TP53基因突變的發(fā)生。TP53基因突變在低級別星形細(xì)胞瘤中發(fā)生率為50%-60%,繼發(fā)性GBM發(fā)生率為70%,原發(fā)性GBM發(fā)生率為25%-37%。目前p53蛋白可通過免疫組化檢測。基因水平可通過PCR測序檢測TP53突變。建議:TP53突變在低級別星形細(xì)胞瘤和繼發(fā)性GBM中發(fā)生率高,有TP53突變的低級別膠質(zhì)瘤預(yù)后較差。 組蛋白(histone)常有多種變體,共分為5種亞基,分別為H1、H2A、H2B、H3和H4。主要參與基因表達(dá)的精細(xì)化調(diào)節(jié),具有調(diào)節(jié)方式多種、不同變體具有不同的作用。2016年WHO中樞神經(jīng)系統(tǒng)腫瘤分類中將其單獨(dú)分為一個新的類型。組蛋白H3.3突變中線結(jié)構(gòu)(如丘腦、腦干及脊髓等)區(qū)域膠質(zhì)瘤中具有極高的表達(dá),且常見于兒童和年輕成人,呈彌漫性生長,腫瘤惡性程度極高,預(yù)后極差。相關(guān)研究均顯示H3K27M突變在彌漫性中線膠質(zhì)瘤中獨(dú)特的基因突變模式。潛在獲益藥物ONC201,Valproicacid. 磷酸酯酶與張力蛋白同源物(PTEN)定位于染色體10q23.3,是蛋白質(zhì)絡(luò)氨酸磷酸酶基因家族成員,是重要的抑癌基因。PTEN蛋白是磷酸酶,它使蛋白質(zhì)去磷酸化而發(fā)揮作用,參與信號通路的轉(zhuǎn)導(dǎo),在細(xì)胞生長/分裂的速度過快或分裂不受控制時,能夠調(diào)控細(xì)胞分裂周期,使細(xì)胞停止分裂并誘導(dǎo)凋亡,這些功能可阻止細(xì)胞的異常增殖進(jìn)而限制腫瘤的形成。建議對WHOIII級和IV級的膠質(zhì)瘤樣本檢測PTEN的突變。有PTEN突變的間變星形細(xì)胞瘤患者預(yù)后較差。https://zhuanlan.zhihu.com/p/163208947 EGFR擴(kuò)增和EGFRv III重排 實(shí)驗(yàn)室檢測方法: EGFR擴(kuò)增:熒光原位雜交; EGFRvⅢ重排:實(shí)時定量PCR,免疫組織化學(xué),多重探針依賴式擴(kuò)增技術(shù)。推薦使用熒光原位雜交檢測EGFR重排。 建議:有EGFR擴(kuò)增的大于60歲的GBM患者預(yù)后差,診斷方面的意義表現(xiàn)在兩方面:對小細(xì)胞GBM的診斷;輔助判定活檢組織的病理結(jié)果。 miR-181d 實(shí)驗(yàn)室檢測方法:原位雜交。 建議:miR-181d對于GBM是一個預(yù)測預(yù)后的可靠指標(biāo)。臨床檢測miR-181d的表達(dá)水平能提示GBM患者對TMZ化療的敏感性。MIR181D https://zhuanlan.zhihu.com/p/28493988 cut -f 1 all.maf | grep -i miRMIR1-1HG-AS1MIR548YMIR3663HGMIR4495MIR3663HGMIR8088MIR4267MIR4444-2MIR3689D2MIR1-1HG ③ 以生物通路、生物過程,甚至自定義的目標(biāo)基因的集合來篩選 ④ 按照突變的“功能效應(yīng)”篩選 例如:同義/錯義突變,剪接位點(diǎn)影響,處在內(nèi)含子或基因間,對已獲批藥物的明確的響應(yīng)與耐藥性,等等。 ⑤ 依賴數(shù)據(jù)庫文件 將GATK Funcotator注釋的all.maf文件 (217列注釋信息,非常全,如下),與COSMIC (Catalogue of Somatic Mutations in Cancer,癌癥體細(xì)胞突變目錄)、ClinVar變異注釋庫文件 (variant_summary_GRCh38.bed.txt)合并,再以特定癌種發(fā)病機(jī)制中的關(guān)鍵詞檢索,獲得目標(biāo)基因及其變異位點(diǎn)。GATK Funcotator注釋的all.maf文件(列的名稱) Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID Genome_Change Annotation_Transcript Transcript_Strand Transcript_Exon Transcript_Position cDNA_Change Codon_Change Protein_Change Other_Transcripts Refseq_mRNA_Id Refseq_prot_Id SwissProt_acc_Id SwissProt_entry_Id Description UniProt_AApos UniProt_Region UniProt_Site UniProt_Natural_Variations UniProt_Experimental_Info GO_Biological_Process GO_Cellular_Component GO_Molecular_Function COSMIC_overlapping_mutations COSMIC_fusion_genes COSMIC_tissue_types_affected COSMIC_total_alterations_in_gene Tumorscape_Amplification_Peaks Tumorscape_Deletion_Peaks TCGAscape_Amplification_Peaks TCGAscape_Deletion_Peaks DrugBank ref_context gc_content CCLE_ONCOMAP_overlapping_mutations CCLE_ONCOMAP_total_mutations_in_gene CGC_Mutation_Type CGC_Translocation_Partner CGC_Tumor_Types_Somatic CGC_Tumor_Types_Germline CGC_Other_Diseases DNARepairGenes_Activity_linked_to_OMIM FamilialCancerDatabase_Syndromes MUTSIG_Published_Results OREGANNO_ID OREGANNO_Values tumor_f t_alt_count t_ref_count n_alt_count n_ref_count Gencode_34_secondaryVariantClassification Achilles_Top_Genes ClinVar_VCF_AF_ESP ClinVar_VCF_AF_EXAC ClinVar_VCF_AF_TGP ClinVar_VCF_ALLELEID ClinVar_VCF_CLNDISDB ClinVar_VCF_CLNDISDBINCL ClinVar_VCF_CLNDN ClinVar_VCF_CLNDNINCL ClinVar_VCF_CLNHGVS ClinVar_VCF_CLNREVSTAT ClinVar_VCF_CLNSIG ClinVar_VCF_CLNSIGCONF ClinVar_VCF_CLNSIGINCL ClinVar_VCF_CLNVC ClinVar_VCF_CLNVCSO ClinVar_VCF_CLNVI ClinVar_VCF_DBVARID ClinVar_VCF_GENEINFO ClinVar_VCF_MC ClinVar_VCF_ORIGIN ClinVar_VCF_RS ClinVar_VCF_SSR ClinVar_VCF_ID ClinVar_VCF_FILTER CosmicFusion_fusion_id Familial_Cancer_Genes_Synonym Familial_Cancer_Genes_Reference Gencode_XHGNC_hgnc_id HGNC_HGNC_ID HGNC_Status HGNC_Locus_Type HGNC_Locus_Group HGNC_Previous_Symbols HGNC_Previous_Name HGNC_Synonyms HGNC_Name_Synonyms HGNC_Chromosome HGNC_Date_Modified HGNC_Date_Symbol_Changed HGNC_Date_Name_Changed HGNC_Accession_Numbers HGNC_Enzyme_IDs HGNC_Ensembl_Gene_ID HGNC_Pubmed_IDs HGNC_RefSeq_IDs HGNC_Gene_Family_ID HGNC_Gene_Family_Name HGNC_CCDS_IDs HGNC_Vega_ID HGNC_OMIM_ID(supplied_by_OMIM) HGNC_RefSeq(supplied_by_NCBI) HGNC_UniProt_ID(supplied_by_UniProt) HGNC_Ensembl_ID(supplied_by_Ensembl) HGNC_UCSC_ID(supplied_by_UCSC) Oreganno_Build Simple_Uniprot_alt_uniprot_accessions dbSNP_ASP dbSNP_ASS dbSNP_CAF dbSNP_CDA dbSNP_CFL dbSNP_COMMON dbSNP_DSS dbSNP_G5 dbSNP_G5A dbSNP_GENEINFO dbSNP_GNO dbSNP_HD dbSNP_INT dbSNP_KGPhase1 dbSNP_KGPhase3 dbSNP_LSD dbSNP_MTP dbSNP_MUT dbSNP_NOC dbSNP_NOV dbSNP_NSF dbSNP_NSM dbSNP_NSN dbSNP_OM dbSNP_OTH dbSNP_PM dbSNP_PMC dbSNP_R3 dbSNP_R5 dbSNP_REF dbSNP_RV dbSNP_S3D dbSNP_SAO dbSNP_SLO dbSNP_SSR dbSNP_SYN dbSNP_TOPMED dbSNP_TPA dbSNP_U3 dbSNP_U5 dbSNP_VC dbSNP_VP dbSNP_WGT dbSNP_WTD dbSNP_dbSNPBuildID dbSNP_ID dbSNP_FILTER HGNC_Entrez_Gene_ID(supplied_by_NCBI) dbSNP_RSPOS dbSNP_VLD AS_FilterStatus AS_SB_TABLE AS_UNIQ_ALT_READ_COUNT CONTQ DP ECNT GERMQ MBQ MFRL MMQ MPOS NALOD NCount NLOD OCM PON POPAF ROQ RPA RU SEQQ STR STRANDQ STRQ TLODClinVar變異注釋庫文件(列的名稱) Chromosome Start Stop #AlleleID Type Name GeneID GeneSymbol HGNC_ID ClinicalSignificance ClinSigSimple LastEvaluated RS# (dbSNP) nsv/esv (dbVar) RCVaccession PhenotypeIDS PhenotypeList Origin OriginSimple Assembly ChromosomeAccession Chromosome Start Stop ReferenceAllele AlternateAllele Cytogenetic ReviewStatus NumberSubmitters Guidelines TestedInGTR OtherIDs SubmitterCategories VariationID PositionVCF ReferenceAlleleVCF AlternateAlleleVCF 可能有其它更多篩選思路。這里列舉的5個思路都可以逐步嘗試一下,看哪個效果合適。目的是:突變瀑布圖可被閱讀,契合研究的目標(biāo),等等。當(dāng)然,這幾個思路也可以組合起來使用。 下面先介紹:以疾病名稱或關(guān)鍵詞的篩選思路。 以腦膠質(zhì)瘤為例,選擇“Glioma”為關(guān)鍵詞 (原因上文已經(jīng)描述),使用的工具是MalaCards:第6期 | 臨床基因組/外顯組數(shù)據(jù)分析實(shí)戰(zhàn) (課件) 疾病檢索結(jié)果中,從頭到尾讀一遍,比如疾病的介紹:GARD (紅體字是將被選取、用于生信篩選的關(guān)鍵詞) 膠質(zhì)瘤 (Glioma)是指從神經(jīng)膠質(zhì)細(xì)胞 (Glial cell)發(fā)展而來的一種腦部腫瘤。神經(jīng)膠質(zhì)細(xì)胞本身是圍繞、支持大腦中神經(jīng)元 (即神經(jīng)細(xì)胞。Neuron: Nerve cell)的特化的細(xì)胞。 膠質(zhì)瘤通常根據(jù)腫瘤中所涉及的神經(jīng)膠質(zhì)細(xì)胞類型進(jìn)行分類:① 星形細(xì)胞瘤 (Astocytoma) - 由稱為星形膠質(zhì)細(xì)胞的星形神經(jīng)膠質(zhì)細(xì)胞發(fā)展而來的腫瘤;② 室管膜瘤 (Ependymoma) - 由排列在腦室和脊髓中心的室管膜細(xì)胞產(chǎn)生的腫瘤;③ 少突膠質(zhì)細(xì)胞瘤 (Oligodendroglioma) - 影響少突膠質(zhì)細(xì)胞的腫瘤。 膠質(zhì)瘤的癥狀因類型而異,但可能包括:頭痛,惡心和嘔吐,混亂,性格變化,平衡問題,視力問題,言語困難 (和/或癲癇發(fā)作)。膠質(zhì)瘤確切的根本原因尚不清楚。在大多數(shù)情況下,腫瘤在沒有家族病史的人身上偶爾發(fā)生 (即起源于隨機(jī)的體細(xì)胞突變)。治療取決于許多因素,包括:腫瘤的類型、大小、階段和位置;可能包括:手術(shù)、放射療法、化學(xué)療法 (和/或靶向療法)。 膠質(zhì)瘤與高級別膠質(zhì)瘤 (High grade glioma)和膠質(zhì)母細(xì)胞瘤 (Glioblastoma)有關(guān)。與膠質(zhì)瘤相關(guān)的一個重要基因是MIR21 (MicroRNA 21),其相關(guān)途徑/超級通路 (Superpathway)包括:細(xì)胞分化擴(kuò)脹指數(shù) (Cell differentiation - expanded index);參與DNA損傷反應(yīng)的miRNA (miRNAs involved in DNA damage response)。在這種疾病的背景下,已經(jīng)提到了藥物達(dá)布拉非尼和乳糖醇 (Dabrafenib and Lactitol)。有關(guān)連的組織包括:大腦、脊髓和T細(xì)胞。 Inherited polymorphisms of the DNA repair genes Germ-line (inherited) polymorphisms of the DNA repair genes ERCC1, ERCC2 (XPD) and XRCC1 increase the risk of glioma. This indicates that altered or deficient repair of DNA damage contributes to the formation of gliomas. DNA damages are a likely major primary cause of progression to cancer in general. Excess DNA damages can give rise to mutations through translesion synthesis. Furthermore, incomplete DNA repair can give rise to epigenetic alterations or epimutations. Such mutations and epimutations may provide a cell with a proliferative advantage which can then, by a process of natural selection, lead to progression to cancer. Epigenetic repression of DNA repair genes is often found in progression to sporadic glioblastoma. For instance, methylation of the DNA repair gene MGMT promoter was observed in 51% to 66% of glioblastoma specimens. In addition, in some glioblastomas, the MGMT protein is deficient due to another type of epigenetic alteration. MGMT protein expression may also be reduced due to increased levels of a microRNA that inhibits the ability of the MGMT mRNA to produce the MGMT protein. Zhang (et al.) found, in the glioblastomas without methylated MGMT promoters, that the level of microRNA miR-181d is inversely correlated with protein expression of MGMT and that the direct target of miR-181d is the MGMT mRNA 3'UTR (the three prime untranslated region of MGMT messenger RNA).Epigenetic reductions in expression of another DNA repair protein, ERCC1, were found in an assortment of 32 gliomas. For 17 of the 32 (53%) of the gliomas tested, ERCC1 protein expression was reduced or absent. In the case of 12 gliomas (37.5%) this reduction was due to methylation of the ERCC1 promoter. For the other 5 gliomas with reduced ERCC1 protein expression, the reduction could have been due to epigenetic alterations in microRNAs that affect ERCC1 expression.When expression of DNA repair genes is reduced, DNA damages accumulate in cells at a higher than normal level, and such excess damages cause increased frequencies of mutation. Mutations in gliomas frequently occur in either isocitrate dehydrogenase (IDH) 1 or 2 genes. One of these mutations (mostly in IDH1) occurs in about 80% of low grade gliomas and secondary high-grade gliomas. Wang (et al.) pointed out that IDH1 and IDH2 mutant cells produce an excess metabolic intermediate, 2-hydroxyglutarate, which binds to catalytic sites in key enzymes that are important in altering histone and DNA promoter methylation. Thus, mutations in IDH1 and IDH2 generate a "DNA CpG island methylator phenotype or CIMP" that causes promoter hypermethylation and concomitant silencing of tumor suppressor genes such as DNA repair genes MGMT and ERCC1. On the other hand, Cohen (et al.) and Molenaar (et al.) pointed out that mutations in IDH1 or IDH2 can cause increased oxidative stress. Increased oxidative damage to DNA could be mutagenic. This is supported by an increased number of DNA double-strand breaks in IDH1-mutated glioma cells. Thus, IDH1 or IDH2 mutations act as driver mutations in glioma carcinogenesis, though it is not clear by which role they are primarily acting. A study, involving 51 patients with brain gliomas who had two or more biopsies over time, showed that mutation in the IDH1 gene occurred prior to the occurrence of a p53 mutation or a 1p/19q loss of heterozygosity, indicating that an IDH1 mutation is an early driver mutation.Pathophysiology High-grade gliomas are highly vascular tumors/tumours and have a tendency to infiltrate diffusely. They have extensive areas of necrosis and hypoxia. Often, tumor growth causes a breakdown of the blood–brain barrier in the vicinity of the tumor. As a rule, high-grade gliomas almost always grow back even after complete surgical excision, so are commonly called recurrent cancer of the brain.Conversely, low-grade gliomas grow slowly, often over many years, and can be followed without treatment unless they grow and cause symptoms. Several acquired (not inherited) genetic mutations have been found in gliomas. Tumor suppressor protein 53 (p53) is mutated early in the disease. p53 is the "guardian of the genome", which, during DNA and cell duplication, makes sure the DNA is copied correctly and destroys the cell (apoptosis) if the DNA is mutated and cannot be fixed. When p53 itself is mutated, other mutations can survive. Phosphatase and tensin homolog (PTEN), another tumor suppressor gene, is itself lost or mutated. Epidermal growth factor receptor (EGFR), a growth factor that normally stimulates cells to divide, is amplified and stimulates cells to divide too much (EGFR是一種在正常生理下刺激細(xì)胞分裂的生長因子,但被放大后,會刺激細(xì)胞過多地分裂). Together, these mutations lead to cells dividing uncontrollably, a hallmark of cancer. In 2009, mutations in IDH1 and IDH2 were found to be part of the mechanism and associated with a less favorable prognosis. IDH1 and IDH2-mutated gliomaPatients with glioma carrying mutations in either IDH1 or IDH2 have a relatively favorable survival, compared with patients with glioma with wild-type IDH1/2 genes. In WHO grade III glioma, IDH1/2-mutated glioma have a median prognosis of ~3.5 years, whereas IDH1/2 wild-type glioma perform poor with a median overall survival of c. 1.5 years. In glioblastoma, the difference is larger. There, IDH1/2 wild-type glioblastoma have a median overall survival of 1 year, whereas IDH1/2-mutated glioblastoma have a median overall survival of more than 3 years. 然后查看頁面中,相關(guān)的基因、通路、變異等欄目: 膠質(zhì)瘤相關(guān)基因 (Genes for Glioma, from MalaCards) Genes related to Glioma (11 elite genes): (showing 119, show less) 。星號- Elite gene CC - Cancer Census gene in COSMIC https://www./card/glioma?limit[RelatedDiseases]=1353&limit[RelatedGenes]=119#RelatedGenes-table1. 將表格復(fù)制、粘貼至Excel表格;2. 另存為“制表符分隔.txt”文件;其它步驟如下: # 3. 用sed命令刪去上肩號Msed -i 's/\r//g' MalaCards-癌種相關(guān)的變異.txtsed -i 's/\r//g' MalaCards-癌種相關(guān)的通路.txtsed -i 's/\r//g' MalaCards-癌種相關(guān)的基因.txt# 4. 確認(rèn)是否仍存在肩號M,其它特殊符號一般無影響cat -A MalaCards-癌種相關(guān)的變異.txtcat -A MalaCards-癌種相關(guān)的通路.txtcat -A MalaCards-癌種相關(guān)的基因.txt# 5. 獲得Gene Listcut -f 2 MalaCards-癌種相關(guān)的基因.txt | sort -ur \ > MalaCards-癌種相關(guān)的基因.List.txtcut -f 5 MalaCards-癌種相關(guān)的通路.txt | \ sed 's/?/\n/g' | sort -ur > MalaCards-癌種相關(guān)的通路.List.txtcut -f 7 MalaCards-癌種相關(guān)的變異.txt | \ sort -ur > MalaCards-癌種相關(guān)的變異.List.txt# 6. 合并Gene Listcat MalaCards-癌種相關(guān)的基因.List.txt MalaCards-癌種相關(guān)的通路.List.txt MalaCards-癌種相關(guān)的變異.List.txt \ > MalaCards-癌種相關(guān)的基因-通路和變異.List.txt 鏈接:https://pan.baidu.com/s/1K16qbB3DGZt3xRJns0Dj2Q 根據(jù)上述全文收集到的腦膠質(zhì)瘤中已被報(bào)道的相關(guān)基因,為Maf文件取子集 (awk命令):
awk 'BEGIN{OFS=FS="\t"}ARGIND==1{gen[$1]=1}ARGIND==2{if(gen[$1]!="" || FNR==1) print $0}' \ MalaCards-癌種相關(guān)的基因-通路和變異.List-Plus-TERT-BRAF-H3K27M-MIR181D-ERCC1-ERCC2-XRCC1.txt \ all.Tumor_Sample_BarcodeAsSampleID.maf \ > all.Tumor_Sample_BarcodeAsSampleID.MalaCards-Glioma-Gene-Term-Variation-Plus-TERT-BRAF-H3K27M-MIR181D-ERCC1-ERCC2-XRCC1.maf 上面的awk命令生成的.maf文件,可使用maftools R包繪制突變瀑布圖,網(wǎng)上很多相關(guān)R代碼,這里不再描述。生成的Maf圖概況如下: 結(jié)果看起來還不錯 (TP53和EGFR都很靠前),這證明了: 當(dāng)前整個腫瘤WES的分析流程走到這里時,基本的框架和方法是沒有問題的。 此外發(fā)現(xiàn)一個有趣的結(jié)果:TP53的突變所涉及的樣本數(shù)不是最多的,有一個基因稍稍領(lǐng)先于它,暗示著TP53并不是在所有類型的腫瘤中都是最重要的那一個,而這無疑可能具有重要意義。 某類型的腫瘤中,相關(guān)突變基因的完整集合如何獲???可查看此篇文章,否則你的Maf圖可能慘不忍睹。 本篇所描述的方法未帶有明顯的偏見和主觀性,而是一種“Knowledge based” (即基于知識庫)的篩選模式。盡管此種方法略顯繁瑣,但是仍可以在任何其它類型的腫瘤中復(fù)現(xiàn),以自行收集所研究的腫瘤中已被報(bào)道的所有的突變基因,你可以認(rèn)為這就是你的大Panel。 最后,在此基礎(chǔ)上發(fā)現(xiàn)新的腫瘤驅(qū)動突變。
|