Yunhui Zeng - 21DOCS Test Area

Research in artificial intelligence demonstrates the applicability and flexibility of the reinforcement learning (RL) technique for the dynamic job shop scheduling problem (DJSP). However, the RL-based method will always overfit to the training environment and cannot generalize well to novel unseen situations at deployment time, which is unacceptable in real-world production. For this reason, this paper proposes a highly generalizable reinforcement learning framework named Train Once For All (TOFA) for the dynamic job shop scheduling problem. The trivial and non-trivial states are distinguished when the DJSP is formulated as a semi-Markov decision process, defining the size-agnostic state, action, and reward function. A novel graph representation learning method based on attention mechanism and spatial pyramid pooling is implemented to compress the disjunctive graphs of differentsize DJSP into fixed-length feature vectors. Combining the proposed dynamic frame skipping and an improved prioritized experience replay method that considers the sample quality difference at different training phases. TOFA shows superb generalization capability, outperforms practically favored dispatching rules and even instance-by-instance training RL-based schedulers on various benchmark DJSP. Additionally, we proved that TOFA acquires a transferable scheduling policy that can be used to schedule a whole new DJSP without additional training.