Integrating Proteomics into the
MTB
The potential of clinical
proteomics
The vast majority of malignancies have predominantly been studied on
genomic and transcriptomic levels, which can only be used as surrogate
approaches for the estimation of protein level information. However, the
limited correlation between mRNA and actual protein abundance has been
shown in multiple studies29–32.
Other biologically relevant aspects such as post-translational
modifications (PTMs) and protein isoforms, can only be studied by
directly investigating the proteome. Proteins are the effector molecules
within cells and tissues and in the context of malignancies promote or
inhibit tumor progression and development. Proteomics provides direct
evidence on a) protein abundances and b) pathway activity, for example
by the detection of activity-associated phosphorylation events. In-depth
large-scale and unbiased proteomic analyses are typically performed
using liquid chromatography coupled with tandem mass spectrometry
(LC-MS/MS).
The National Cancer Institute (NCI) has recognized the huge potential of
clinical proteomics and launched an initiative, the Clinical Proteomic
Tumor Analysis Consortium (CPTAC), to accelerate the understanding of
tumor biology in ways not possible through genomics alone. By applying
rigorous standards to proteomic measurements, CPTAC investigators
perform large-scale reproducible proteomic studies. CPTAC’s research
over the last years has shown the power of proteomics and
phosphoproteomics, enabling the reclassification of molecular tumor
subtypes and the identification of pathways related to clinical outcomes33,34.
Technical and methodological development on multiple fronts of MS-based
proteomics, from sample preparation to data analysis, have led to
increased throughput, sensitivity and accuracy, allowing the
identification and quantification of over 5000 proteins using minute
input/sample material35–38.
Generally, MS-based proteomics can either be explorative, aiming to
detect as many proteins and peptides in each measurement as possible, or
targeted, aiming for reliable and sensitive identification and
quantification of a panel of a priori known proteins. With highly
sensitive sample processing and LC-MS instruments, targeted and
explorative analysis can be performed using minimal sample amounts.
During bottom-up proteomic approaches the proteins are enzymatically
digested into peptides, which are ionized and measured in the mass
spectrometer.
Peptide and protein quantification in MS-based clinical proteomics can
be performed label-free or by using metabolic or chemical labeling such
as tandem mass tags (TMT)36,39–42.
Due to their cost-effectiveness and simplicity, label-free
quantification methods are widely used, especially in large-scale
studies. Furthermore, for the investigation of individual patient
samples, it might prove beneficial to measure the samples independently,
circumventing batch effects and minimize biases to facilitate relative
comparisons between samples43.
In data-dependent acquisition (DDA) mode, only a few selected precursors
with the highest intensity are fragmented44.
This approach introduces stochastic effects and leads to a significant
number of missing values, where proteins are not detected consistently
over multiple samples. Conversely, the data-independent acquisition
(DIA) mode provides a more robust and reproducible approach for the
explorative analysis of multiple samples45,46.
In DIA predefined fragmentation windows (m/z windows) are used for
simultaneous fragmentation of multiple precursors and consistent data
acquisition across samples. The assignment of spectra to peptides and
proteins is performed by comparing the measured spectra with a reference
spectral library. By utilizing DIA, researchers can achieve more
reliable and comprehensive quantification results compared to DDA,
thereby enhancing the accuracy and reproducibility of proteomic analyses46,47.
To improve reproducibility and efficiency, different sample preparation
workflows including automation using liquid handling platforms have been
established over the last years48–51.
These developments paved the way for reproducible and robust proteomics
while allowing for high-throughput and minimizing human errors and
variability. Proteomic workflows can handle a wide range of different
sample types including solid tissue samples, serum, and urine. Dedicated
workflows have been developed for the processing of formalin-fixed and
paraffin-embedded (FFPE) samples which constitute the most common
storage modality for resected tissue and the standard modality for
routine diagnostics (e.g. immunostainings and genomics). Multiple
studies showed that proteins and PTMs are generally more stable than
RNA, allowing for in-depth analysis even after decades of storage time52–55.
Consequently, proteomic workflows can be easily integrated into routine
molecular diagnostic sample processing in pathology institutes.
Proteomics further allows the large-scale detection and quantification
of PTMs such as phosphorylation, glycosylation, acetylation, and
ubiquitination56–60.
These modifications are an important mechanism to adapt and alter
protein activity, stability, and localization. The modified peptides are
most often substoichiometric, caused by regulatory processes, which
necessitates enrichment during sample processing. Dedicated (partially
automated) workflows have been developed for this over the last few
years, allowing stable detection of modified proteins61–64.
Another proteomic field called immunopeptidomics focuses on all peptides
presented by human leukocyte antigen (HLA) proteins on the cell surface.
HLA proteins present a wide range of different antigen peptides
originating from the proteasomal degradation of endogenous or exogenous
proteins to T-cells. In the tumor context, the identification of
tumor-specific antigens is essential for the development of
epitope-specific cancer immunotherapies65.
Presented antigens do not correlate with transcript or protein abundance
and until now, mass spectrometry-based immunopeptidomics is the only
available method for providing direct evidence of antigen presentation.
The vast potential of clinical proteomics can be seen by multiple
studies identifying single and/or panels of biomarkers and potential
therapeutic targets in various diseases39,42,66–71.
Furthermore, clinical proteomics holds the potential for therapy
response prediction and longitudinal treatment characterization, such as
in the context of tumor development and progression72–74.
Proteomics in precision
oncology
In-depth clinical proteomics is widely used to analyze large cohorts of
patients, providing further insights into the molecular pathology of
various malignancies67,75–77.
However, an important question arises: How can proteomics prove
beneficial in the analysis of the proteome of an individual patient with
a clinically challenging and complex tumor disease?
In the past five years, there has been a rapid emergence of clinical
proteomic data through initiatives like CPTAC and other projects. This
data facilitates the implementation of proteomics into precision
oncology and serves as a valuable reference resource. However, the
number of proteomic datasets with detailed clinical annotation in these
databases remains limited. Furthermore, the lack of standardized
preprocessing of quantitative proteomic data renders direct quantitative
comparisons challenging. Therefore, the implementation of proteomics for
individual patients still remains a challenge, which is addressed by
only a few current studies78,79.
The integration of proteomics into the MTB is an emerging and highly
relevant topic, as illustrated by the number of PubMed entries (as of
07/23): The search for ”molecular tumor board*” genomic*, yields about
190 entries, whereas searching for ”molecular tumor board*” proteomic*
results in mere 9 entries. This underlines the novelty and the untapped
potential of implementing proteomics into MTBs and highlights the urgent
need for further investigation and adoption of proteomic approaches in
personalized oncology. In this endeavor, several aspects need to be
considered.
Given that the patients’ samples often include non-tumorous areas,
macrodissection of tumorous tissue is required to focus on the actual
tumor proteomes. Techniques such as laser capture microdissection offer
the ability to selectively isolate specific tumor regions and individual
cell populations, providing a focused analysis and enabling the
representation of tumor heterogeneity at a spatial resolution80–82.
Further insight into proteome alterations during tumor development and
progression can be gained by additionally processing healthy tissue and
in case of metastasis formation the metastatic and primary tumor
tissues.
State-of-the-art protocols and instrumentation enable proteomic
processing and analysis of FFPE samples within less than a week, meeting
the clinical need for timely treatment recommendations83.
Furthermore, these protocols routinely yield more than 5000 identified
and quantified proteins even from minimal FFPE input material such as
biopsies. One of the key potentials of comprehensive MS-based clinical
proteomics lies within the detailed inquiry of therapy-relevant proteins
within the extensive lists of identified and quantified proteins.
LC-MS/MS approaches enable the detection of direct drug targets, such as
PD-L1, which is necessary for a successful immunotherapeutic treatment84,85.
Furthermore, frequently amplified targets and biomarkers such as HER2
and EGFR can be identified in the proteomic data and could be
subsequently corroborated in a histology-conserving approach such as
immunohistochemistry86,87.
Other druggable targets that can be observed using LC-MS/MS include
MAP2K1, MAP2K2, CDK4, and CDK6. The loss of CDKN2A/B is associated with
loss of p16 and a loss of cell cycle control which is a possible
indication for CDK4/CDK6 inhibitors, such as Ribociclib, Abemaciclib and
Palbociclib88,89.
In our own work, we were able to detect CDK4 and CDK6 in a patient with
a loss of function mutation in CDKN2A , supporting the rationale
for CDK4/6 inhibition.
The development of antibody-drug conjugate (ADC) therapy, where
monoclonal antibodies are covalently linked to cytotoxic agents, has
immensely broadened the selection of potential targets in personalized
cancer therapy22,90–92.
This has led to novel therapeutic targets such as Trop-2 and Nectin-4
for which the proteomic data can be systematically screened, followed by
corroboration using immunostaining93.
The comprehensive proteomic data can be further screened for the absence
or presence of tumor suppressors such as TP53 and mismatch repair
proteins including MLH1, MSH2, MSH3, MSH6, PMS2, which indicates
microsatellite-stability94.
Druggable oncogenic signaling pathways are of particular interest in
tumor treatment. Pathway activity can be assessed by thorough
investigation of PTMs. Protein phosphorylation is one of the most
crucial PTMs and has been associated to promote tumor development and
progression highlighting the relevance for a detailed analysis of the
phosphoproteome. Phosphoproteomics provides particular deep insights
into pathway activity. Several kinase-related pathways are connected to
tumor growth and progression, which has promoted the development of
multiple therapeutic drugs targeting these kinases. The individual
analyses of activated pathways and highly active kinases hold promising
potential in the precise identification of therapeutic targets. The
ongoing TOPAS study explores the benefit of integrating
phosphoproteomics for individual patients78.
In our own work, we detected multiple phosphorylated peptides of ERK1/2
connected to MAPK-ERK pathway activity in a patient with a BRAF fusion.
This fits findings from other studies, highlighting an activation of
BRAF by this fusion event95.
Further molecular insights can be gained by combining proteins into
baskets, e.g. therapeutic targets, morphologic/histologic markers, and
proteins related to the same pathway. Thus, proteomic analyses
significantly contribute to the understanding of the underlying cancer
type and propose potential treatment options. In a retrospective study,
it could be shown that targeted proteomics partially support
genomic-driven therapeutic strategies but also propose alternative
treatment options and provide possible explanations for treatment
failure79.
Advanced bioinformatic search strategies emerged to not only account for
the canonical human proteome but to also enable the detection of protein
isoforms as well as proteins from other species, e.g. viral or bacterial
proteins. One example is the protein isoform claudin-18.2, a novel
target in esophagus carcinoma and gastric cancer96.
Metaproteomic investigations enable the detection of viral antigens in
the context of virus-related tumors, such as cervical cancer, and
intra-tumoral fungi and bacteria in different cancer types97,98.
Data Integration and
Proteogenomics
One of the most crucial developments in clinical proteomics is the
potential to detect thousands of proteins from minute sample material.
The highly sensitive protocols and LC-MS instruments enable the
integration of proteomics into the current molecular diagnostic routine
within the MTB without the need for excessive additional samples.
Consequently, proteomics provides complementary biological information
which enables the integration of multiple layers of molecular
information (Figure 2).
By integrating proteomics, several questions regarding the correlation
of these data can be studied. How well do copy number variations
correlate with proteomics? Do proteomic results correlate with findings
in immunohistochemistry? Can a loss or gain of function mutation of a
protein be detected in the downstream pathway activity?
One powerful example of data integration is the rising field of
proteogenomics which links genomics and transcriptomics to proteomics33,34,66,99–104.
The CPTAC aims to systematically identify cancer-relevant proteins that
derive from alterations in cancer genomes and the related biological
processes in large-scale studies105.
However, proteogenomics can also be applied for individual patients in a
personalized approach. Typically, in an endogenous bottom-up proteomic
search, proteins are cleaved into peptides during tryptic digestion and
subsequently measured and analyzed using a canonical protein database.
In the tumor context oncogenic mutations can lead to single amino acid
variants (saavs), that reflect the mutated DNA in an altered peptide
sequence. Thus, by using the patient-specific genomic/transcriptomic
information, the endogenous protein database can be expanded to contain
tryptic saav peptide sequences.
These proteogenomic analyses are especially valuable as they report on
the penetrance of genomic alterations and give complementary biological
information provided by actual effector molecules. Mutated proteins can
be either a direct target of drugs such as the KRAS G12C variant and
also propose a therapeutic direction106,107.
Examples from our own work include the detection of the KRAS G12D
variant peptide (LVVVGAD GVGK) by applying an in-house
patient-matched proteogenomic analysis workflow combining genomic
(TruSight
Oncology 500 Assay) and LC-MS/MS proteomic data. Thus, proteomic data
corroborates genomic findings and provides direct evidence of the actual
presence of the mutated gene. The KRAS G12D variant has been shown to be
a potential therapeutic target for selective inhibition and the
identification of the KRAS G12D mutation provides a rationale for the
inclusion in ongoing clinical studies108–113.
Limitations and upcoming challenges
The adequate interpretation of proteomic data derived from individual
patients remains challenging. One of the major limitations of clinical
state-of-the-art proteomics remains that due to the intrinsic
complexity, the dynamic range of protein abundances as well as the lack
of amplification during the sample preparation the complete proteome
cannot be measured. Consequently, the reported non-identification of a
protein does not necessarily mean that it was not present in the
original sample. Thus, combinatorial procedures including explorative
and targeted measurements might provide a compromise for vast proteome
coverage as well as reliable and robust detection of proteins of
interest such as established biomarkers.
Generally, rigorous quality control (QC) must be included throughout the
proteomic processing to ensure equal and high quality of the proteomic
results. For LC-MS/MS-based proteomics, the QC could include parameters
such as the addition of synthetic indexed retention time peptides and/or
measurement of control samples such as commercial peptide standards. For
these standards, certain analysis results such as retention times,
number of peptide and protein IDs as well as intensities should be
within an expected range and consequently could be reported in
conjunction with the individual MS-based patient data.
The integration of large-scale proteomic studies into clinical research
is an emerging field. Current clinical studies mainly investigate
genomic alterations in context with clinical phenotypes and treatment
responses, which promotes the lack of proteomic data and the correlation
thereof with the clinical presentations. Accordingly, decision support
systems for molecular tumor boards presently involve exclusively NGS
data114–116.
However, further integration of proteomics is a crucial aspect of the
molecular understanding of tumor biology. Therefore, proteomics should
be included in clinical studies on all levels, complementing genomic
stratification and deepening the molecular understanding of individual
disease progression and therapy response. This requires a fundamental
rethinking process and shift on multiple layers, most of all an openness
to this new multi-omics approach.