The efficient operation of cellular networks requires precise tuning of configuration parameters such as the antenna downtilt or the transmit power. Here, data-driven methods, which can integrate feedback from monitoring data, are promising but face challenges related to sample efficiency, scalability, and safety --- limiting their real-world application. In this work, we introduce an innovative online coverage and capacity optimization framework that combines model-free exploration using Monte Carlo tree search with a safe, model-based baseline derived from a probabilistic differentiable network twin. We formulate the optimization task as a sequential tree search problem, developing specialized policies to guide the exploration of the configuration space. The differentiable network twin aids both the selection policy, by pruning the search space with a prior action distribution, and the rollout policy, enabling domain knowledge-based selection of remaining actions. Our results demonstrate that the proposed approach effectively guides the optimization process, outperforming both purely model-free and model-based methods. Our solution improves safety by reducing the risk of testing poorly performing configurations, enhances the model-based solution, and can also compensate for severe model mismatches in the digital twin. Hence, the framework addresses a common obstacle in applying data-driven optimization to real-world network deployments.