Artificial neural networks (ANNs) are able to distill the hierarchical aspects of raw data. They are central to machine learning functions including speech and pattern recognition, medical diagnosis, playing board games, computer vision, and many other areas [1-7]. Optical neural networks (ONNs) in particular can significantly increase the computing speed of ANNs in order to overcome the intrinsic bandwidth bottleneck of electronics. Convolutional neural networks (CNNs), are inspired by biological systems such as the visual cortex, and are a powerful approach to greatly reduce the parametric network complexity in order to enhance the accuracy of the predictions of the system. In this paper, we demonstrate a universal optical convolutional accelerator that can be used in conjunction with both electronic and optical neural networks. It operates beyond 10 Tera-OPS (TOPS - operations per second) and produces convolutions of extremely large scale images of 250,000 pixels in size with a resolution of 8-bits. It generates 10 convolutions simultaneously in parallel, with 10 different kernels. This processing simultaneously — enough for facial image recognition. After demonstrating this, we then use the exact hardware to form a convolutional neural network consisting of a convolutional front-end followed by a deep optical neural network fully connected layer, together forming a CNN with ten neurons at the output. We successfully perform the recognition of all 10 hand written digits, each consisting of 900 pixel handwritten digit images. We achieve an accuracy of 88% which is very close to the theoretical accuracy of 90%. We use an approach that exploits the simultaneous multiplexing, or interleaving, within the time, space and wavelength dimensions, using an optical frequency comb supplied by an integrated Kerr micro-comb source. We compare the performance of different optical neural networks, explicitly showing that our approach is intrinsically scalable in both size and speed, up to the PetaOPs per second (POPs) regime in speed and to well over 24,000 synapses in size. We perform theoretical evaluation of the scaled system performance and show that it is trainable to much more complex networks for real-world demanding applications including real-time video recognition and autonomous unmanned vehicle control.