With the rapid development of artificial intelligence and the dramatic growth of communication services, the sixth-generation (6G) wireless network needs to handle communication tasks more flexibly and efficiently, significantly exacerbating the challenge of resource allocation. For the access network scenarios in 6G networks, the existing single-layer reinforcement learning resource allocation algorithms are hard to satisfy the diverse demands of users due to the complex and variable state space. Therefore, we propose a reinforcement learning-based two-timescale resource allocation scheme, aiming to jointly enhance the quality of service and system resource utilization. The proposed method comprises an upper-layer controller that allocates network resources to lower-layer controllers on a large time scale. Then, lower-layer controllers refine the resources based on user service types on a smaller time scale. To implement the proposed two-timescale allocation scheme, we propose a two-layer reinforcement learning framework consisting of a deep deterministic policy gradient (DDPG) and a dueling deep Q network (Dueling-DQN). Furthermore, recognizing that coupling multiple reinforcement learning processes may slow down algorithm convergence, we employ asynchronous training, transfer learning, and prediction-based action space simplification to expedite the model’s convergence speed. Finally, we build a phototyping network to verify the performance of the proposed small-timescale and the large-timescale allocation algorithms. Our proposed scheme demonstrates significant improvements in both resource utilization and quality of service compared to existing schemes.