To the Editor,
Pollen allergies are on the rise globally. Therefore, for more and more
people, fast and accurate monitoring of airborne pollen provides an
essential early warning system. Although the allergenic pollen from some
plants can be monitored at the species level (e.g. ragweed,Ambrosia artemisiifolia L.1), many species
cannot be accurately identified to this level based on their pollen. In
many taxa, only a genus- or family-level identification is possible
using current microscopic methods, even if they possess very different
allergenic profiles. An extra challenging factor in airborne pollen
identification is that they are collected directly from the
air2. Because they are not
acetolyzed3, and thus still contain all organic
material, defining features are less apparent in these pollen
grains4.
The identification challenge is exemplified in the case of the nettle
family (Urticaceae). Pollen grains produced by all species from the
genus Urtica L. (stinging nettles) are allergenically irrelevant,
while pollen from several species of Parietaria L. (pellitory)
are a major cause of hay fever, in particular P. judaica L. andP. officinalis L.5 Both these species of
pellitory are native to the Mediterranean, but throughout the second
half of the twentieth century, a range expansion occurred through
north-western Europe, the Americas and Australia as a result of
anthropogenic distribution and climate change6. Their
pollen is virtually indistinguishable from that of native Urticanettles and their contribution to the total aerial pollen load is
currently not assessed.
Automatic image recognition can be used to improve identification of
pollen taxa that are difficult to distinguish. Subtle variations in
morphology that are not readily apparent through microscopic
investigation may be consistently detected by neural networks. Here we
use the Deep Convolutional Neural Network (DCNN) model VGG16 to
distinguish morphologically similar, unacetolyzed pollen from the nettle
family (see online supporting information for details on VGG16).
Pollen from all five species of the nettle family present in the
Netherlands was collected from plants in multiple locations (both fresh
and herbarium material) to account for potential intraspecific
variability. This includes pollen from three species of Urticaand two exotic Parietaria species (Supplementary Table 1). To
train the VGG16 model, a minimum of 1000 pollen images per species were
taken at 100x magnification using a Widefield Zeiss observer. We wrote a
custom script in ImageJ7 to automatically extract
individual pollen grains from raw multifocal images. Three different
image projections were produced to visualize the pollen in two rather
than three dimensions, and to emphasize distinguishable features (Figure
S3, script available on
https://github.com/pollingmarcel/Pollen_Projector). The VGG16
model was pre-trained on the ImageNet dataset and contains 13
convolutional, and three fully connected layers. To mitigate potential
overfitting and to increase the size and variability of the training
data set, we used 10-fold cross-validation and data augmentation (Figure
S4). We used 90% of the pollen images to train the model, and 10% were
used for subsequent testing (see online Supplementary Information for a
complete description of materials and methods).
The trained VGG16 model accurately identified pollen to the genus level
in 95.9% of the time for Urtica and 97.8% for Parietaria(Figure 1). The species Urtica membranacea was confidently
distinguished from all other Urticaceae species (99.2%), but
distinction at the species-level was not possible for the otherUrtica and Parietaria species. This is because the
distinguishing differences between pollen from these species could not
be resolved in the image projections used.
From the raw pollen images, we further identified clear intra-specific
differences in the pollen grains that result from natural variability
within each species. To test whether VGG16 learned the correct
distinguishing features rather than sample-specific details, we produced
feature maps (Figure 2). Despite the highly variable input images of
unacetolyzed pollen, the model learned distinct features such as edges
in the first convolutional layers, while finer features, such as pores
and annuli, were learned in deeper layers.
This is the first time a DCNN model has been used to increase the
accuracy of unacetolyzed pollen. The model represents a significant
improvement of earlier attempts in distinguishing Urticaceae using
automatic image recognition. In a previous study using shape and texture
features, pollen from three Urticaceae species could be distinguished
from another (89% accuracy8), though only a small
image dataset was used. DCNN models have shown similar accuracy rates to
ours on larger and more varied pollen datasets as well, but these either
focussed on the family level9 or on insect-collected
pollen for honey analysis10. Moreover, all of these
studies used acetolyzed pollen that allow for easier recognition of
distinguishing features, and used pollen collected from only a single
location.
In conclusion, using a combination of an image-processing workflow and a
sufficiently trained DCNN model, we were able to differentiate
unacetolyzed pollen grains from two genera in the nettle family. These
are genera that are indistinguishable with current microscopic methods
but possess different allergenic profiles, and thus the ability to
differentiate them is of medical significance. We expect this method can
be more broadly applied to distinguish pollen from similarly challenging
plant families and will aid in producing more accurate hay fever
forecasts.