Lidar and cameras serve as essential sensors for automated vehicles and intelligent robots, and they are frequently fused in complicated tasks. Precise extrinsic calibration is the prerequisite of Lidar-camera fusion. Hand-eye calibration is almost the most commonly used targetless calibration approach. This paper presents a particular degeneration problem of hand-eye calibration when sensor motions lack rotation. This context is common for ground vehicles, especially those traveling on urban roads, leading to a significant deterioration in translational calibration performance. To address this problem, we propose a novel motion-based Lidar-camera calibration framework based on cross-modality structure consistency. It is globally convergent within the specified search range and can achieve satisfactory translation calibration accuracy in degenerate scenarios. To verify the effectiveness of our framework, we compare its performance to one motion-based method and two appearance-based methods using six Lidar-camera data sequences from the KITTI dataset. Additionally, an ablation study is conducted to demonstrate the effectiveness of each module within our framework. Our codes are now available on githubfor reproduction.