In recent years, much of the research on speech enhancement (SE) has focused on using deep neural network models to improve performance. While these models often excel on specific datasets, they tend to have high memory and computational demands. This makes it crucial to evaluate their performance across different datasets in real-world applications. To tackle this challenge, we propose a novel speech enhancement method that integrates a diffusion model with Actor-Critic techniques. Our approach involves processing noisy speech through a diffusion model and a pre-trained large-scale language model, which acts as the Actor to label the noisy input. These labels then serve as conditions for the SE model. The enhanced output is evaluated by the Critic, which scores the voice quality. During training, the SE model's enhanced outputs are reintroduced as noisy inputs for the Actor and SE models, fostering further improvements. We conducted experiments using two different sizes of the TIMIT and VCTK-DM datasets. In the test results, although CDiffSE-AC demonstrates limited score improvements within the same dataset, it significantly outperforms in crossdataset tests, even surpassing the baseline MOSE method on the TIMIT and VCTK-DM test sets. Specifically, CDiffSE-AC achieves PESQ improvements of 0.15 (7.2%) and 0.18 (7.5%), respectively.