MALDI-TOF MS
For measurements using MALDI-TOF MS, the same tissue was used from a
subset of the genetically studied individuals, always including a single
pereopod with the attached muscles. Tissue was incubated in 5 µl of a
matrix solution containing α-Cyano-4-hydroxycinnamic acid (HCCA) as a
saturated solution in 50% acetonitrile, 47.5% molecular grade water
and 2.5% trifluoroacetic acid. After 5 minutes of incubation, 1.5 µl of
the extract solution was applied to one spot for crystallization on a
target plate. Measurements were carried out on a Microflex LT/SH System
(Bruker Daltonics), employing the flexControl 3.4 (Bruker Daltonics)
software. Measured mass range was set from 2k to 20k Dalton. For peak
evaluation, mass peak range from 2k to 10k Dalton was analyzed using a
centroid peak detection algorithm, a signal-to-noise threshold of 2, and
a minimum intensity threshold of 600, with a peak resolution higher than
400. The Proteins/Oligonucleotide method was employed for fuzzy control
with a maximal resolution ten times above the threshold. For a sum
spectrum, 200 satisfactory shots were summed up. Three mass spectra were
measured for each specimen. Quality control by eye was carried out and
mass spectra of inferior quality were discarded. In the following
analyses, only specimens with a respective COI sequence were used. For
comparison of inter- and intraspecific variance spectra of the congener
species H. foresti (n = 9), H. angustus (n = 4) andH. hamatus (n = 5) were used (see Supplemental Table S1).
Data processing was carried out in R (R Core Team, 2020) using R
packages MALDIquant (Gibb & Strimmer, 2012) and MALDIquantForeign
(Gibb, 2015). Protein mass spectra were trimmed to an identical range
from 2,000 to 20,000 m/z and smoothed with the Savitzky-Golay method
(Savitzky & Golay, 1964). The baseline was removed based on the SNIP
baseline estimation method (Ryan, Clayton, Griffin, Sie, & Cousens,
1988)using 15 iterations. Mass spectra were normalized using the TIC
method implemented in MALDIquant. Noise estimation was carried out with
a signal to noise ratio (SNR) of 7. Repeated peak binning was carried
out with a tolerance of 0.002 in a strict approach and resulting bins
were aligned using R package MALDIrppa (Palarea-Albaladejo, McLean,
Wright, & Smith, 2018). For the resulting intensity matrix, missing
values were interpolated from the corresponding spectrum. All signals
below a SNR of 1.75 were assumed to be below detection limit and set to
zero in the final peak matrix. This matrix was Hellinger transformed
(Legendre & Gallagher, 2001) for further use. Intra- and interspecific
Euclidean distances were calculated using ‘vegdist’ from R package vegan
(Oksanen et al., 2013). To test group differentiation for classification
approaches and to assess mass peak importance for group differentiation
a Random Forest (RF, Breiman, 2001) analysis was carried out using R
package randomForest (Liaw & Wiener, 2002, ntree = 2000, mtry = 35).
Significant deviation from random of the observed model errors was
calculated with the function MVSF.test from package RFTools
(https://github.com/pmartinezarbizu/RFtools)
(Rossel & Martínez Arbizu, 2018). Significance of differences were
tested using the distance-based multivariate analysis of variance (W*d)
developed by Hamini et al. (2019).