Downlink channel temporal prediction is a critical technology in massive multiple-input multiple-output (MIMO) systems. However, existing methods that rely on fixed-step historical sequences significantly limit the accuracy, practicality, and scalability of channel prediction. Recent advances have shown that large language models (LLMs) exhibit strong pattern recognition and reasoning abilities over complex sequences. The challenge lies in effectively aligning wireless communication data with the modalities used in natural language processing to fully harness these capabilities. In this work, we introduce Csi-LLM, a novel LLM-powered downlink channel prediction technique that models variable-step historical sequences. To ensure effective cross-modality application, we align the design and training of Csi-LLM with the processing of natural language tasks, leveraging the LLM's next-token generation capability for predicting the next step in channel state information (CSI). Simulation results demonstrate the effectiveness of this alignment strategy, with Csi-LLM consistently delivering stable performance improvements across various scenarios and showing significant potential in continuous multi-step prediction.