Aojie jiang - 21DOCS Test Site

Mixed-precision Neural Networks achieve high energy efficiency and throughput for hardware deployment. The most common mixed-precision methods adopt layer-wise granularity. However, the layer-wise method does not quantize the model to its limit because the optimal bit precision that preserves accuracy for different kernels can be different. To address this issue, this paper presents GroupQ, a group-wise quantization method with multi-objective optimization for CNN accelerators. Group-wise divides the convolutional kernels in a layer into several groups by clustering, and each group shares the same bit precision. The multi-objective optimization algorithm is used to optimize the quantization policy automatically based on the selected quantization objectives, such as model accuracy, model size, or computation cost. The experiments show that GroupQ significantly outperforms the existing layer-wise retraining-free methods, even better than some training-based methods. Specifically, GroupQ achieves a 0.49% higher accuracy with up to 28.1% smaller Bit Operations (BOPs) on ResNet-18 compared to HAWQ-V3 and can quantize MobileNetV2 to 1.65MB model size with 71.75% top-1 accuracy. This paper shows that GroupQ is friendly for hardware deployment by a lookup table (LUT)-based mixed-precision processing element (LMPE) proposed for CNN accelerators. LMPE provides power reduction of up to 3.6%, up to 3.9% lower area, compared to conventional implementation.