Yanpeng Qu - 21DOCS Test Area

Data imbalance is a common challenge for many classification tasks where the dataset suffers from disproportionate partitions of samples between classes. Typical techniques for handling class imbalance are either based on algorithmic or data-driven strategies. The algorithmic strategies reduce model bias towards the majority classes by modifying the classification algorithms, and the data-driven strategies revise the datasets in terms of resampling. The current implementations of the datadriven strategy often ignore the correlations between instances but assign a uniform sampled dataset to all queries despite their distinctions. This may eliminate valid instances and retain irrelevant instances during sampling, thereby negatively affecting classification performance. To address this limitation, this paper presents a representation-based sampling (RBS) approach, implemented in a bi-stage hybrid framework, including a training stage to compile the sampling dictionary and a retrieving stage to sample the data with respect to the query. In the first stage, RBS learns the correlations between each object per class via a reconstruction model, and produces a sampling dictionary for each class. In the second stage, for a given query, the sampled data specific to this query in each class are retrieved via the offline sampling dictionary by locating its most related object in each class. Because the number of the sampled data in each class are unified, the classification of the query is guaranteed to be conducted on a class-balance dataset. The systematic experiments uisng image datasets demonstrate that RBS can effectively solve the data imbalance issue in classification and improve the representation of images with correlated features, leading to a better recognition performance.