Yi Tan

and 5 more

Data imbalance is a common challenge for many classification tasks where the dataset suffers from disproportionate partitions of samples between classes. Typical techniques for handling class imbalance are either based on algorithmic or data-driven strategies. The algorithmic strategies reduce model bias towards the majority classes by modifying the classification algorithms, and the data-driven strategies revise the datasets in terms of resampling. The current implementations of the datadriven strategy often ignore the correlations between instances but assign a uniform sampled dataset to all queries despite their distinctions. This may eliminate valid instances and retain irrelevant instances during sampling, thereby negatively affecting classification performance. To address this limitation, this paper presents a representation-based sampling (RBS) approach, implemented in a bi-stage hybrid framework, including a training stage to compile the sampling dictionary and a retrieving stage to sample the data with respect to the query. In the first stage, RBS learns the correlations between each object per class via a reconstruction model, and produces a sampling dictionary for each class. In the second stage, for a given query, the sampled data specific to this query in each class are retrieved via the offline sampling dictionary by locating its most related object in each class. Because the number of the sampled data in each class are unified, the classification of the query is guaranteed to be conducted on a class-balance dataset. The systematic experiments uisng image datasets demonstrate that RBS can effectively solve the data imbalance issue in classification and improve the representation of images with correlated features, leading to a better recognition performance.

Yanpeng Qu

and 3 more

Fuzzy-rough sets (FRS) encapsulate the related but distinct concepts of vagueness (for fuzzy sets) and indiscernibility (for rough sets), both of which occur as a result of uncertainty in data, information or knowledge. The application of FRS in feature selection (FS) has employed the dependency degree to guide the FS process with much success. Whilst promising, most existing fuzzy-rough feature selection (FRFS) approaches are only conducted at the level of individual features, considering the inclusion/exclusion of individual features with regard to a candidate feature subset. In this case, the insight of meaningful information about certain inherent feature structure, such as the correlation between features or the collaborative contribution to a common decision may be ignored. To address this issue, an exclusive lasso assisted two-stage fuzzy-rough FS (EL-TSFRFS) method is presented in this paper. First, regarding discernibility, all features are divided into distinct groups using k-means clustering and the exclusive lasso regularization is utilised to select the representative features in each cluster, with such selected features sorted in descending order within the cluster. Second, a feature grouping-based FRFS algorithm is implemented to further determine the final discriminating feature subset. Comparative experimental results show that the reduct gained by the proposed approach generally outperforms those attained by alternative implementations of FRFS, in terms of both the size of the selected feature subset and the subsequent classification accuracy using the feature subset. Moreover, this is the first work to apply exclusive lasso to fuzzy-rough feature selection.