Deep learning for procedural level generation has been explored in many recent works, however, experimental comparisons with previous works are rare and usually limited to the work they extend upon. This paper’s goal is to conduct an experimental study on four recent deep learning procedural level generators for Sokoban to explore their strengths and weaknesses. The methods will be bootstrapping conditional generative models, controllable & uncontrollable procedural content generation via reinforcement learning (PCGRL) and generative playing networks. We will propose some modifications to either adapt the methods to the task or improve their efficiency and performance. For the bootstrapping method, we propose using diversity sampling to improve the solution diversity, auxiliary targets to enhance the models’ quality and Gaussian mixture models to improve the sample quality. The results show that diversity sampling at least doubles the unique plan count in the generated levels. On average, auxiliary targets increases the quality by 24% and sampling conditions from Gaussian mixture models increases the sample quality by 13%. Overall, PCGRL shows superior quality and diversity while generative adversarial networks exhibit the least control confusion when trained with diversity sampling and auxiliary targets.