Ali A. Al-Hamid - 21DOCS Test Site

Convolutional Neural Networks (CNN) compression and optimization techniques for the inference process attract attention due to CNN’s broad range of applications in many fields. Implementing CNNs for embedded deceives, however, faces great challenges due to the devices’ limited hardware resources. Recently, real-time pplications based on complex CNNs require optimization and compression techniques to make CNNs run faster in compact and energyefficient embedded devices. CNN odel optimization and compression include pruning and weight quantization. Conventional pruning techniques often suffer from poor compression ratio, so they still have a large number of float point weights leading to embedded devices with excessive hardware cost or slow inference speed. Conventional quantization techniques often result in an unacceptable loss in inference accuracy. To overcome the above problems, the proposed method introduces an adaptive exponential weight discretization process, which selects discretization parameters based on the sensitivity and the number of the weights in each layer. Unlike many previous compression methods, the proposed method does not require any retraining process nor any fine-tuning methods. The proposed weight discretization has been evaluated using the VGG16 CNN model with the ImageNet dataset for classification. The evaluation demonstrated that it saves 48 % of computation for fully connected layers and it reduces the overall weight memory size from 528 MB to 44.24 MB – a reduction of 11.9 times. We also demonstrate that the optimized CNN loses negligible loss – a loss of only .43% in the top-5 accuracy and a loss of only 1.44% in the top-1 accuracy.