Deformable medical image registration plays an important role in many clinical applications. It aims to find a dense deformation field to establish point-wise correspondences between a pair of fixed and moving images. Recently, unsupervised deep learning-based registration methods have drawn more and more attention because of fast inference at testing stage. Despite remarkable progress, existing deep learning-based methods suffer from several limitations including: a) they often overlook the explicit modeling of feature correspondence due to limited receptive fields; b) the performance on image pairs with large spatial displacements is still limited since the dense deformation field is regressed from features learned by local convolutions; and c) desirable properties, including topology-preservation and the invertibility of transformation, are often ignored. To address above limitations, we propose a novel network consisting of Siamese Multi-scale Interactive-representation LEarning (SMILE) encoder and Hierarchical Diffeomorphic Deformation (HDD) decoder. Specifically, the SMILE encoder aims for effective feature representation learning and spatial correspondence establishing while the HDD decoder seeks to regress the dense deformation field in a coarse-to-fine manner. We additionally propose a novel Local Invertible Loss (LIL) to encourage topology-preservation and local invertibility of the regressed transformations while keep high registration accuracy. Extensive experiments conducted on two publicly available datasets demonstrate the superiority of our method over the state-of-the-art (SOTA) approaches.