Bokeh effect is growing to be an important feature in photography, essentially to choose an object of interest to be in focus with the rest of the background being blurred. While naturally rendering this effect requires a DSLR with large diameter of aperture, with the current advancements in Deep Learning, this effect can also be produced in mobile cameras. Most of the existing methods use Convolutional Neural Networks while some relying on the depth map to render this effect. In this paper, we propose an end-to-end Vision Transformer model for Bokeh rendering of images from monocular camera. This architecture uses vision transformers as backbone, thus learning from the entire image rather than just the parts from the filters in a CNN. This property of retaining global information coupled with initial training of the model for image restoration before training to render the blur effect for the background, allows our method to produce clearer images and outperform the current state-of-the-art models on the EBB! Data set. The code to our proposed method can be found at: https://github.com/Soester10/ Bokeh-Rendering-with-Vision-Transformers.