Implementation of three-dimensional (3D) numerical electromagnetic (EM) inverse scattering algorithms has been limited for over two decades due to the huge computational cost in practical applications such as medical microwave imaging. The emergence of graphics processing units (GPU) computing has offered new possibilities for alleviating the computational burden related to EM computations, but its optimized deployment for EM inverse scattering has not been explored. We present a suite of optimization strategies towards this aim, and study their impact on computational savings in different computing platforms. Our results demonstrate the significance of incorporating the solution of the inverse problem on heterogeneous systems. By combining these strategies in an optimal way, we arrive at imaging times of half a minute for iterative microwave tomography (MWT) problems involving eight antennas and hundreds thousands of reconstructed voxels in simulation domains of over 1 million voxels.