Mixture of Experts (MoE) is an advanced machine learning paradigm that enhances model scalability, efficiency, and specialization by dynamically selecting subsets of experts for each input. This survey provides a comprehensive overview of MoE, covering its fundamental principles, advantages, challenges, recent advancements, and future research directions. MoE architectures have been widely adopted in large-scale artificial intelligence systems, particularly in natural language processing and computer vision, due to their ability to optimize computational resources while maintaining high model capacity. Key challenges such as training instability, expert imbalance, computational overhead, and interpretability are discussed alongside potential mitigation strategies. Recent innovations, including hierarchical MoE, improved gating mechanisms, adaptive expert pruning, and MoE integration into multimodal learning, have significantly enhanced the framework's effectiveness. The survey also explores future research directions, focusing on improving training stability, fairness, continual learning, and resource efficiency. As MoE continues to evolve, it remains a promising approach for developing highly efficient and scalable AI systems.