State-of-the-art Heterogeneous System on Chips (HMPSoCs) can perform on-chip embedded inference on its CPU and GPU. Multi-component pipelining is the method of choice to provide high-throughput Convolutions Neural Network (CNN) inference on embedded platforms. In this work, we provide details for the first CPU-GPU pipeline design for CNN inference called Pipe-All. Pipe-All uses the ARM-CL library to integrate an ARM big.Little CPU with an ARM Mali GPU. Pipe-All is the first three-stage CNN inference pipeline design with ARM’s big CPU cluster, Little CPU cluster, and Mali GPU as its stages. Pipe-All provides on average 75.88% improvement in inference throughput (over peak single-component inference) on Amlogic A311D HMPSoC in Khadas Vim 3 embedded platform. We also provide an open-source implementation for Pipe-All. This paper is submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) as a transaction brief paper (5 pages).