We present a novel method for transferring human actions from a source to a target video with improved human motion smoothness and better image quality. Unlike the previous image-to-image pose transfer and synthesis work, which in actuality transfer information from an image to another image, we take consideration of temporal pose variations. Recent methods aiming at video-to-video human motion transfer suffer from the generated image quality. In contrast our proposed Video-to-video Action Transfer GAN (VAT-GAN) has achieved better image quality by employing a cascaded sequence of Action Transfer Blocks (ATB) with multi-resolution structure similarity (MR-SSIM) loss. The addition of MR-SSIM loss has resulted in improved generated image quality compared to the previously used ℓ1 based loss. To ensure temporal human motion smoothness, we make the ATBs aware of temporal pose variations of the target video. The proposed VAT-GAN is evaluated on an existing dataset proposed by Chan et al. [1] as well as on new test videos downloaded from the internet consisting of unseen persons, actions and poses. The proposed VAT-GAN has achieved better visual and quantitative performance compared to the existing state of the art methods both in terms of image quality and human motion smoothness. The code and the results including synthesized videos will be made publicly available: https://github.com/MehwishG/VAT-GAN.