Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communications but also for various machine learning applications. However, a ground station (GS) may be incapable of downloading such a large volume of raw sensing data for centralized model training due to the limited contact time with LEO satellites (e.g. 5 minutes). Therefore, federated learning (FL) has emerged as the promising solution to address this problem via on-device training. Unfortunately, enabling FL on LEO satellites still face three critical challenges: i) heterogeneous computing and memory capabilities, ii) limited downlink/uplink rate, and iii) model staleness. To this end, we propose FedSN as a general FL framework to tackle the above challenges. Specifically, we first present a novel sub-structure scheme to enable heterogeneous local model training considering different computing, memory, and communication constraints on LEO satellites. Additionally, we propose a pseudo-synchronous model aggregation strategy to dynamically schedule model aggregation for compensating model staleness. Extensive experiments with real-world satellite data demonstrate that FedSN framework achieves higher accuracy, lower computing, and communication overhead than the state-of-the-art benchmarks.