Abstract
Labeling fine-grained objects manually is extremely challenging, as it
is not only label-intensive but also requires professional knowledge.
Accordingly, robust learning methods for fine-grained recognition with
web images collected from Internet of Things have drawn significant
attention. However, training deep fine-grained models directly using
untrusted web images is confronted by two primary obstacles: 1) label
noise in web images and 2) domain variance between the online sources
and test datasets. To this end, in this study, we mainly focus on
addressing these two pivotal problems associated with untrusted web
images. To be specific, we introduce an end-to-end network that
collaboratively addresses these concerns in the process of separating
trusted data from untrusted web images. To validate the efficacy of our
proposed model, untrusted web images are first collected by utilizing
the text category labels found within fine-grained datasets.
Subsequently, we employ the designed deep model to eliminate label noise
and ameliorate domain mismatch. And the chosen trusted web data are
utilized for model training. Comprehensive experiments and ablation
studies validate that our method consistently surpasses other
state-of-the-art approaches for fine-grained recognition task in a
real-world scenario. Simultaneously, this introduces a novel pipeline
for fine-grained recognition with substantial efficacy in practical
applications. The source code and models can be accessed at:
https://github.com/NUST-Machine-Intelligence-Laboratory/DDN.