Integrating Proteomics into the MTB

The potential of clinical proteomics

The vast majority of malignancies have predominantly been studied on genomic and transcriptomic levels, which can only be used as surrogate approaches for the estimation of protein level information. However, the limited correlation between mRNA and actual protein abundance has been shown in multiple studies29–32. Other biologically relevant aspects such as post-translational modifications (PTMs) and protein isoforms, can only be studied by directly investigating the proteome. Proteins are the effector molecules within cells and tissues and in the context of malignancies promote or inhibit tumor progression and development. Proteomics provides direct evidence on a) protein abundances and b) pathway activity, for example by the detection of activity-associated phosphorylation events. In-depth large-scale and unbiased proteomic analyses are typically performed using liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS).
The National Cancer Institute (NCI) has recognized the huge potential of clinical proteomics and launched an initiative, the Clinical Proteomic Tumor Analysis Consortium (CPTAC), to accelerate the understanding of tumor biology in ways not possible through genomics alone. By applying rigorous standards to proteomic measurements, CPTAC investigators perform large-scale reproducible proteomic studies. CPTAC’s research over the last years has shown the power of proteomics and phosphoproteomics, enabling the reclassification of molecular tumor subtypes and the identification of pathways related to clinical outcomes33,34.
Technical and methodological development on multiple fronts of MS-based proteomics, from sample preparation to data analysis, have led to increased throughput, sensitivity and accuracy, allowing the identification and quantification of over 5000 proteins using minute input/sample material35–38.
Generally, MS-based proteomics can either be explorative, aiming to detect as many proteins and peptides in each measurement as possible, or targeted, aiming for reliable and sensitive identification and quantification of a panel of a priori known proteins. With highly sensitive sample processing and LC-MS instruments, targeted and explorative analysis can be performed using minimal sample amounts. During bottom-up proteomic approaches the proteins are enzymatically digested into peptides, which are ionized and measured in the mass spectrometer.
Peptide and protein quantification in MS-based clinical proteomics can be performed label-free or by using metabolic or chemical labeling such as tandem mass tags (TMT)36,39–42. Due to their cost-effectiveness and simplicity, label-free quantification methods are widely used, especially in large-scale studies. Furthermore, for the investigation of individual patient samples, it might prove beneficial to measure the samples independently, circumventing batch effects and minimize biases to facilitate relative comparisons between samples43. In data-dependent acquisition (DDA) mode, only a few selected precursors with the highest intensity are fragmented44. This approach introduces stochastic effects and leads to a significant number of missing values, where proteins are not detected consistently over multiple samples. Conversely, the data-independent acquisition (DIA) mode provides a more robust and reproducible approach for the explorative analysis of multiple samples45,46. In DIA predefined fragmentation windows (m/z windows) are used for simultaneous fragmentation of multiple precursors and consistent data acquisition across samples. The assignment of spectra to peptides and proteins is performed by comparing the measured spectra with a reference spectral library. By utilizing DIA, researchers can achieve more reliable and comprehensive quantification results compared to DDA, thereby enhancing the accuracy and reproducibility of proteomic analyses46,47.
To improve reproducibility and efficiency, different sample preparation workflows including automation using liquid handling platforms have been established over the last years48–51. These developments paved the way for reproducible and robust proteomics while allowing for high-throughput and minimizing human errors and variability. Proteomic workflows can handle a wide range of different sample types including solid tissue samples, serum, and urine. Dedicated workflows have been developed for the processing of formalin-fixed and paraffin-embedded (FFPE) samples which constitute the most common storage modality for resected tissue and the standard modality for routine diagnostics (e.g. immunostainings and genomics). Multiple studies showed that proteins and PTMs are generally more stable than RNA, allowing for in-depth analysis even after decades of storage time52–55. Consequently, proteomic workflows can be easily integrated into routine molecular diagnostic sample processing in pathology institutes. Proteomics further allows the large-scale detection and quantification of PTMs such as phosphorylation, glycosylation, acetylation, and ubiquitination56–60. These modifications are an important mechanism to adapt and alter protein activity, stability, and localization. The modified peptides are most often substoichiometric, caused by regulatory processes, which necessitates enrichment during sample processing. Dedicated (partially automated) workflows have been developed for this over the last few years, allowing stable detection of modified proteins61–64. Another proteomic field called immunopeptidomics focuses on all peptides presented by human leukocyte antigen (HLA) proteins on the cell surface. HLA proteins present a wide range of different antigen peptides originating from the proteasomal degradation of endogenous or exogenous proteins to T-cells. In the tumor context, the identification of tumor-specific antigens is essential for the development of epitope-specific cancer immunotherapies65. Presented antigens do not correlate with transcript or protein abundance and until now, mass spectrometry-based immunopeptidomics is the only available method for providing direct evidence of antigen presentation.
The vast potential of clinical proteomics can be seen by multiple studies identifying single and/or panels of biomarkers and potential therapeutic targets in various diseases39,42,66–71. Furthermore, clinical proteomics holds the potential for therapy response prediction and longitudinal treatment characterization, such as in the context of tumor development and progression72–74.

Proteomics in precision oncology

In-depth clinical proteomics is widely used to analyze large cohorts of patients, providing further insights into the molecular pathology of various malignancies67,75–77. However, an important question arises: How can proteomics prove beneficial in the analysis of the proteome of an individual patient with a clinically challenging and complex tumor disease?
In the past five years, there has been a rapid emergence of clinical proteomic data through initiatives like CPTAC and other projects. This data facilitates the implementation of proteomics into precision oncology and serves as a valuable reference resource. However, the number of proteomic datasets with detailed clinical annotation in these databases remains limited. Furthermore, the lack of standardized preprocessing of quantitative proteomic data renders direct quantitative comparisons challenging. Therefore, the implementation of proteomics for individual patients still remains a challenge, which is addressed by only a few current studies78,79. The integration of proteomics into the MTB is an emerging and highly relevant topic, as illustrated by the number of PubMed entries (as of 07/23): The search for ”molecular tumor board*” genomic*, yields about 190 entries, whereas searching for ”molecular tumor board*” proteomic* results in mere 9 entries. This underlines the novelty and the untapped potential of implementing proteomics into MTBs and highlights the urgent need for further investigation and adoption of proteomic approaches in personalized oncology. In this endeavor, several aspects need to be considered.
Given that the patients’ samples often include non-tumorous areas, macrodissection of tumorous tissue is required to focus on the actual tumor proteomes. Techniques such as laser capture microdissection offer the ability to selectively isolate specific tumor regions and individual cell populations, providing a focused analysis and enabling the representation of tumor heterogeneity at a spatial resolution80–82. Further insight into proteome alterations during tumor development and progression can be gained by additionally processing healthy tissue and in case of metastasis formation the metastatic and primary tumor tissues.
State-of-the-art protocols and instrumentation enable proteomic processing and analysis of FFPE samples within less than a week, meeting the clinical need for timely treatment recommendations83. Furthermore, these protocols routinely yield more than 5000 identified and quantified proteins even from minimal FFPE input material such as biopsies. One of the key potentials of comprehensive MS-based clinical proteomics lies within the detailed inquiry of therapy-relevant proteins within the extensive lists of identified and quantified proteins. LC-MS/MS approaches enable the detection of direct drug targets, such as PD-L1, which is necessary for a successful immunotherapeutic treatment84,85. Furthermore, frequently amplified targets and biomarkers such as HER2 and EGFR can be identified in the proteomic data and could be subsequently corroborated in a histology-conserving approach such as immunohistochemistry86,87. Other druggable targets that can be observed using LC-MS/MS include MAP2K1, MAP2K2, CDK4, and CDK6. The loss of CDKN2A/B is associated with loss of p16 and a loss of cell cycle control which is a possible indication for CDK4/CDK6 inhibitors, such as Ribociclib, Abemaciclib and Palbociclib88,89. In our own work, we were able to detect CDK4 and CDK6 in a patient with a loss of function mutation in CDKN2A , supporting the rationale for CDK4/6 inhibition.
The development of antibody-drug conjugate (ADC) therapy, where monoclonal antibodies are covalently linked to cytotoxic agents, has immensely broadened the selection of potential targets in personalized cancer therapy22,90–92. This has led to novel therapeutic targets such as Trop-2 and Nectin-4 for which the proteomic data can be systematically screened, followed by corroboration using immunostaining93. The comprehensive proteomic data can be further screened for the absence or presence of tumor suppressors such as TP53 and mismatch repair proteins including MLH1, MSH2, MSH3, MSH6, PMS2, which indicates microsatellite-stability94.
Druggable oncogenic signaling pathways are of particular interest in tumor treatment. Pathway activity can be assessed by thorough investigation of PTMs. Protein phosphorylation is one of the most crucial PTMs and has been associated to promote tumor development and progression highlighting the relevance for a detailed analysis of the phosphoproteome. Phosphoproteomics provides particular deep insights into pathway activity. Several kinase-related pathways are connected to tumor growth and progression, which has promoted the development of multiple therapeutic drugs targeting these kinases. The individual analyses of activated pathways and highly active kinases hold promising potential in the precise identification of therapeutic targets. The ongoing TOPAS study explores the benefit of integrating phosphoproteomics for individual patients78. In our own work, we detected multiple phosphorylated peptides of ERK1/2 connected to MAPK-ERK pathway activity in a patient with a BRAF fusion. This fits findings from other studies, highlighting an activation of BRAF by this fusion event95.
Further molecular insights can be gained by combining proteins into baskets, e.g. therapeutic targets, morphologic/histologic markers, and proteins related to the same pathway. Thus, proteomic analyses significantly contribute to the understanding of the underlying cancer type and propose potential treatment options. In a retrospective study, it could be shown that targeted proteomics partially support genomic-driven therapeutic strategies but also propose alternative treatment options and provide possible explanations for treatment failure79.
Advanced bioinformatic search strategies emerged to not only account for the canonical human proteome but to also enable the detection of protein isoforms as well as proteins from other species, e.g. viral or bacterial proteins. One example is the protein isoform claudin-18.2, a novel target in esophagus carcinoma and gastric cancer96. Metaproteomic investigations enable the detection of viral antigens in the context of virus-related tumors, such as cervical cancer, and intra-tumoral fungi and bacteria in different cancer types97,98.

Data Integration and Proteogenomics

One of the most crucial developments in clinical proteomics is the potential to detect thousands of proteins from minute sample material. The highly sensitive protocols and LC-MS instruments enable the integration of proteomics into the current molecular diagnostic routine within the MTB without the need for excessive additional samples. Consequently, proteomics provides complementary biological information which enables the integration of multiple layers of molecular information (Figure 2).
By integrating proteomics, several questions regarding the correlation of these data can be studied. How well do copy number variations correlate with proteomics? Do proteomic results correlate with findings in immunohistochemistry? Can a loss or gain of function mutation of a protein be detected in the downstream pathway activity?
One powerful example of data integration is the rising field of proteogenomics which links genomics and transcriptomics to proteomics33,34,66,99–104. The CPTAC aims to systematically identify cancer-relevant proteins that derive from alterations in cancer genomes and the related biological processes in large-scale studies105.
However, proteogenomics can also be applied for individual patients in a personalized approach. Typically, in an endogenous bottom-up proteomic search, proteins are cleaved into peptides during tryptic digestion and subsequently measured and analyzed using a canonical protein database. In the tumor context oncogenic mutations can lead to single amino acid variants (saavs), that reflect the mutated DNA in an altered peptide sequence. Thus, by using the patient-specific genomic/transcriptomic information, the endogenous protein database can be expanded to contain tryptic saav peptide sequences.
These proteogenomic analyses are especially valuable as they report on the penetrance of genomic alterations and give complementary biological information provided by actual effector molecules. Mutated proteins can be either a direct target of drugs such as the KRAS G12C variant and also propose a therapeutic direction106,107.
Examples from our own work include the detection of the KRAS G12D variant peptide (LVVVGAD GVGK) by applying an in-house patient-matched proteogenomic analysis workflow combining genomic (TruSight Oncology 500 Assay) and LC-MS/MS proteomic data. Thus, proteomic data corroborates genomic findings and provides direct evidence of the actual presence of the mutated gene. The KRAS G12D variant has been shown to be a potential therapeutic target for selective inhibition and the identification of the KRAS G12D mutation provides a rationale for the inclusion in ongoing clinical studies108–113.

Limitations and upcoming challenges

The adequate interpretation of proteomic data derived from individual patients remains challenging. One of the major limitations of clinical state-of-the-art proteomics remains that due to the intrinsic complexity, the dynamic range of protein abundances as well as the lack of amplification during the sample preparation the complete proteome cannot be measured. Consequently, the reported non-identification of a protein does not necessarily mean that it was not present in the original sample. Thus, combinatorial procedures including explorative and targeted measurements might provide a compromise for vast proteome coverage as well as reliable and robust detection of proteins of interest such as established biomarkers.
Generally, rigorous quality control (QC) must be included throughout the proteomic processing to ensure equal and high quality of the proteomic results. For LC-MS/MS-based proteomics, the QC could include parameters such as the addition of synthetic indexed retention time peptides and/or measurement of control samples such as commercial peptide standards. For these standards, certain analysis results such as retention times, number of peptide and protein IDs as well as intensities should be within an expected range and consequently could be reported in conjunction with the individual MS-based patient data.
The integration of large-scale proteomic studies into clinical research is an emerging field. Current clinical studies mainly investigate genomic alterations in context with clinical phenotypes and treatment responses, which promotes the lack of proteomic data and the correlation thereof with the clinical presentations. Accordingly, decision support systems for molecular tumor boards presently involve exclusively NGS data114–116. However, further integration of proteomics is a crucial aspect of the molecular understanding of tumor biology. Therefore, proteomics should be included in clinical studies on all levels, complementing genomic stratification and deepening the molecular understanding of individual disease progression and therapy response. This requires a fundamental rethinking process and shift on multiple layers, most of all an openness to this new multi-omics approach.