Long-term analysis of climate trends and patterns relies on continuous and high-frequency observation data sets. Still, due to limitations in historical meteorological observation techniques and national policies, most weather stations worldwide can only provide three, four, or eight observations per day, hindering climate change research progress. To solve the problem of low-frequency daily observation in part of global meteorological stations, we propose a time-downscaling model of observation series based on deep learning, Land Surface Observation Simulator-Time Series Version (LOS-T), taking 2m air temperature as an example. LOS-T, combined with multimodal technology and Transformer architecture, effectively merges multiple types of data, including low-frequency observations, ERA5-land, and geographic information, to convert low-frequency observations into hourly high-frequency observations. The model showed significant accuracy improvements by training on millions of meteorological observations worldwide, especially on downscaling the data, which only has three observations per day. The results showed that LOS-T substantially improved over baseline models such as Bilinear and vanilla Transformer on several metrics such as MAE, RMSE, COR, and R2. In addition, case studies have confirmed that LOS-T can effectively utilize ERA5-land’s high-frequency temperature change information to improve the accuracy and robustness of predictions, even when there is a significant deviation between ERA5-land data and Ground Truth. In short, LOS-T provides new ways to refine global meteorological observation data and helps advance climate science.