Luciano Prono

and 6 more

The growing interest in Internet of Things and mobile Artificial Intelligence applications is pushing the investigation on Deep Neural Networks (DNNs) that can operate at the edge using low-resources/energy devices.To obtain such a goal, several pruning techniques have been proposed in the literature. They aim to reduce the number of interconnections - and consequently the size, and the corresponding computing and storage requirements - of DNNs that traditionally rely on classic Multiply-and-ACcumulate (MAC) neurons.In this work, we propose a novel neuron structure based on a Multiply-And-Max/min (MAM) map-reduce paradigm, and we show that by exploiting this new paradigm it is possible to build naturally and aggressively prunable DNN layers, with a negligible loss in performance. This novel structure allows a greater interconnection sparsity when compared to classic MAC-based DNN layers. Moreover, most of the already existing state-of-the-art pruning techniques can be used with MAM layers with little to no changes. To test the pruning performance of MAM, we employ different models - AlexNet, VGG-16 and the more recent ViT-B/16 - and different computer vision datasets - CIFAR-10, CIFAR-100 and ImageNet-1K. Multiple pruning approaches are applied, ranging from single-shot methods to training-dependent and iterative techniques.As a notable example, we test MAM on the ViT-B/16 model fine-tuned on the ImageNet-1K task and apply one-shot gradient-based pruning. We remove interconnections until the model experiences a 3% decrease in accuracy. While MAC-based layers need at least 56.16% remaining interconnections, MAM-based layers achieve the same accuracy with only 0.04%.