A genome-wide view of transcriptomic and proteomic results is presented in Fig. These novel peptides not only highlight missing annotations in the Ensembl rat genome annotations but also represent protein isoforms expressed in microglia. From the 220 novel peptides, 177 were supported by one or more transcripts with a minimum FPKM of 0.5. We calculated the average FPKM values from Cuffdiff estimates from three biological replicates and used it as an additional metric for novel loci. Rat brain-derived RNA-seq FPKM values were also considered as a support for the remaining 220 high confident novel peptides. It revealed that 45 (≈17%) of the novel peptides might arise from annotated genes, and these were not considered while annotating novel translated genomic regions. To eliminate the possibility of novel peptides originating from known genes as a result of non-synonymous mutations, we mapped novel peptides to the annotated proteins with one residue mutated each time throughout the peptide. Two hundred and sixty five (87%) of the novel peptides mapped exclusively to a unique locus in the genome. In total 11,503 peptides were identified of which 10,963 mapped to annotated Ensembl proteins, 235 to contaminant proteins, and 305 to un-annotated regions ( supplemental File 3). To minimize the suspected high false positives in eukaryotic proteogenomic analyses, we considered peptide identifications at a PSM level FDR of ≤1% and detected in both replicates of each sample. Protein summaries from individual sample searches are provided in supplemental File 2. From an automated analysis of MS data from three biological replicates with two technical replicates each, a total of 4,431 proteins were identified at a protein FDR of <1%, of which 21 were contaminant proteins. Our integrative multi-omics data analysis not only enables the discovery of new proteoforms but also generates an improved reference for human disease studies in the rat model. Novel isoforms were also discovered for genes implicated in cardiovascular diseases and breast cancer for which rats are considered model organisms. These novel peptides aided in the discovery of novel exons, translation of annotated untranslated regions, pseudogenes, and splice variants for various loci many of which have known disease associations, including neurological disorders like schizophrenia, amyotrophic lateral sclerosis, etc. Using four proteogenomic pipelines (integrated transcriptomic-proteomic, Peppy, Enosi, and ProteoAnnotator) on publicly available RNA-sequence and MS proteomics data, we discovered 363 novel peptides in rat brain microglia representing novel proteoforms for 249 gene loci in the rat genome. We developed EuGenoSuite, an open source multiple algorithmic proteomic search tool and utilized it in our in-house integrated transcriptomic-proteomic pipeline to facilitate automated proteogenomic analysis. However, detection of novel proteoforms using mass spectrometry proteomics data remains a formidable challenge. Proteogenomic re-annotation and mRNA splicing information can lead to the discovery of various protein forms for eukaryotic model organisms like rat.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |