Large neural networks have demonstrated success in various predictive tasks using Electronic Health Records (EHR). However, their performance in small divergent patient cohorts, such as those with rare diseases, often falls short of simpler linear models due to the substantial data requirements of large models. To address this limitation, we introduce the SANSformers architecture, specifically designed for forecasting healthcare utilization within EHR. Distinct from traditional transformers, SANSformers utilize attention-free mechanisms, thereby reducing complexity. We also present Generative Summary Pretraining (GSP), a self-supervised learning technique that enables large neural networks to maintain predictive efficiency even with smaller patient subgroups. Through extensive evaluation across two real-world datasets, we provide a comparative analysis with existing state-of-the-art EHR prediction models, offering a new perspective on predicting healthcare utilization.