The object tracking research continues to be active since long period because of the several real-world variations imposed in the tracking process, like occlusion, changing appearance, illumination changes and cluttered background. With wide range of applications, embedded implementations are typically pursed for the tracking systems. Although object trackers based on Convolution Neural Network (CNN) have achieved state-of-the-art performance, they challenge the embedded implementations because of slow speed and large memory requirement. In this paper, we address these limitations on the algorithm-side and the circuitside. On the algorithm side, we adopt interpolation schemes which can significantly reduce the processing time and the memory storage requirements. We also evaluate the approximation of the hardware-expensive computations aiming for an efficient hardware implementation. Moreover, we modified the online-training scheme in order to achieve a constant processing time across all video frames. On the circuit side, we developed a hardware accelerator of the online training stage. We avoid the transposed reading from the external memory to speed-up the data movement with no performance degradation. Our proposed hardware accelerator achieves 45.9 frames-per-second in training the fully connected layers.