To emulate a quantum computation on a classical computer i.e. the evolution of the unitary operations on the wave function of the particle in quantum mechanics, we have to perform unitary matrix and normalized vector multiplications in the high-level programming languages of Python, C++, Java, etc. Quantum Libraries already available perform the matrix-vector multiplication in the backend using the numpy libraries of Python like Qiskit or use a C++ wrapper to further optimize the runtime it as in Qiskit-Aer Simulators. Since a fully functioning fault-tolerant computer is decades away, it is in our best interest to design new quantum algorithms and develop accelerator test beds for Quantum Emulations. All the quantum computer operations can be emulated on a classical computer, with the only downside being that the matrix multiplications scale up as 𝑂 (𝑁 3) in runtime. In contrast, the quantum computer scales it up as 𝑂 (𝑁 2 𝑙𝑜𝑔 2 𝑁), where N = 2 𝑛 , where N is the matrix dimension, where n is the number of qubits, so the runtime for quantum emulations on the classical computer increases exponentially with increase in number of qubits and increases linearly with increase in number of depths, complexity wise. Though it is not possible to change the exponential index, it is possible to reduce the runtime for quantum emulations on classical computers by use of GPU and Alveo Accelerator Cards, and also code optimization on the software side like using a C++ wrapper. In this paper, we will benchmark the matrix-matrix multiplications on HPC Accelerator Cards varying the qubit size and the depth of the quantum circuit and provide a universal mathematical equation for the runtime on the GPU and Alveo Vector Cards for two variables of qubit size and quantum circuit depth. So a theoretical limit on qubit size and qubit depth exactly can be established for quantum emulations on present classical supercomputers.