EV proteomic data processing and analysis
Quality control of the sequencing data was performed using FastQC and
the fastp package, with reads having a quality score below 20 being
discarded. All data analyses were conducted in R version 4.3.1 within
the RStudio environment. The EV ID–protein expression dataset was
generated by counting the total number of distinct moleculeTags for each
proteinTag sharing the same complexTag. Total protein abundance for each
sample, referred to as the raw bulk abundance data, was determined by
summing the associated moleculeTags for each protein. Differences in
library size between samples were corrected using TMM normalization. For
bulk data, differential expression analysis was performed using DESeq2
package and visualized through pheatmap package.
All single-EV proteomic data analysis were conducted using Seurat R
package. Differential expression analysis between control and
experimental groups or different clusters were performed with
FindMarkers and FindAllMarkers. Functional enrichment analysis was
conducted using the clusterProfiler R package (version 4.2.2), with key
parameters specified as pAdjustMethod = ”BH”. All other parameters were
left at their default settings to maintain consistency. Dimensionality
reduction and clustering were performed using the FindNeighbors function
(dimensions = 1:20) and FindClusters function (resolution = 0.8). The
Harmony function was employed to remove batch effects during this
process. The identified clusters were visualized using uniform manifold
approximation and projection (UMAP). However, the high number of
clusters observed suggested potential over-segmentation. To address
this, we first then manually merged certain clusters based on their
enriched and characteristic genes, resulting in fewer but more
interpretable groups. Each group was then named according to its
characteristic genes (e.g. Immunity Cluster for immune-related
proteins). We used the Ro/e statistic to quantify the proportions of EVs
from the experiment and control groups within individual clusters and
sub-clusters, facilitating the comparison of group-specific differences
in cluster composition. In order to avoid overestimation or bias caused
by highly similar proteomic profiles between different clusters by
carefully controlling for statistical uncertainties through
marginalization techniques, we also use BayesPrism package which could
integrate results from bulk and single EV analyses. This approach
enables a more comprehensive understanding of the molecular landscape by
reconciling bulk-level patterns with single EV heterogeneity.