Predicting the future trajectories of agents in dynamic, multi-agent environments remains a fundamental challenge, especially when models lack interpretability, an essential factor for safety-critical applications like autonomous driving. We propose the Scene-level Trajectory Prediction Transformer (STPT), a novel framework that integrates diffusion-based generative modeling with Kan network mechanisms to capture both spatial and temporal dynamics of agent-environment interactions. STPT leverages a recursive diffusion process that refines trajectory predictions over multiple time steps, explicitly accounting for uncertainty and inter-agent dependencies. Importantly, we introduce a Shapley value-based feature attribution technique tailored for diffusion models, quantifying the global and scenario-specific importance of features such as traffic signals and lane geometry at every stage of the prediction process. Extensive evaluations on benchmark datasets demonstrate that STPT not only surpasses state-of-the-art trajectory prediction methods in accuracy but also sets a new standard in real-time interpretability, making it particularly suited for deployment in safety-critical systems requiring both precision and accountability.