Roberto Rocchetta

and 2 more

This work investigates generalization error bounds that apply to general classes of learning problems and focuses on recently observed parallels between PAC-learning bounds, like compression and complexity-based bounds, and novel error guarantees derived within Scenario theory. Scenario theory renders non-asymptotically and distributional free generalization error bounds for models trained by solving data-driven decision-making problems. Scenario bounds, like complexity-based bounds, can be derived before solving the learning problem from an indicator of the model’s expressiveness and richness or, like compression bounds, can apply to infinite Vapnik–Chervonenkis dimensional problems and on data-dependent hypothesis spaces. Relevant theorems and assumptions are reviewed and discussed, and a comparison of the quality of the bounds proposed for Support Vector Machines (SVMs) classifiers. We analyze the tightness of the bounds numerically for SVM trained on several randomized data sets from thirteen real-life classification problems. This analysis allows for a fair comparison of different approaches from both conceptual and experimental standpoints. Based on the numerical results, we argue that scenario bounds appear to be tighter in many cases and are always bounded in [0,1], hence non-vacuous, and are better in predicting the performance order for different hyper-parameter settings. This work shows that scenario bounds can be a powerful tool to support model selection, structural-risk mini