Temporal dependence (TD) is a fundamental concept exploited for modeling real world systems' behavior of sequential nature. Markov chains (MCs) are powerful tools capable of modeling such time dependent behavior. The parameters of the MCs may be learned by Stochastic Learning Weak Estimator (SLWE) proposed by Oommen and Rueda, which operates effectively in non-stationary environments. In complex problems such as detection, identification or recognition of complex patterns, besides TD, a dependence regarding the positions or neighboring, i.e., spatial dependence (SD), is likely to provide a significant contribution to the analysis of relevant data. Regular MCs remain insufficient in exploiting the SD extractable from the data. In this paper, we extend MCs that utilize, besides TD, SD, and present the theoretical background for spatio-temporal MCs (STMCs) considering the spatial domain in a different manner from literature. Also, SLWE is extended to estimate the parameters of a discrete-time first-order homogeneous STMC and corresponding theorems regarding the asymptotic behavior of estimates are provided. We show that the proposed method SLWE of STMC (SLWE STMC) (i) outperforms the MCs relying solely on temporal or spatial dependence, and the traditional estimation methods in synthetic experiments, (ii) has competitive forecasting performance, and is significantly less complex compared with Deep Learning (DL) methods in real-world experiments.