Bo Yuan - 21DOCS Test Area

Bo Yuan

IEEE Senior Member

Australia

Public Documents 3

An Exploration of Higher Education Course Evaluation by Large Language Models

Bo Yuan

and 1 more

November 13, 2024

Course evaluation is a critical component in higher education pedagogy. It not only serves to identify limitations in existing course designs and provide a basis for curricular innovation, but also to offer quantitative insights for university administrative decision-making. Traditional evaluation methods, primarily comprising student surveys, instructor self-assessments, and expert reviews, often encounter challenges, including inherent subjectivity, feedback delays, inefficiencies, and limitations in addressing innovative teaching approaches. Recent advancements in large language models (LLMs) within artificial intelligence (AI) present promising new avenues for enhancing course evaluation processes. This study explores the application of LLMs in automated course evaluation from multiple perspectives and conducts rigorous experiments across 100 courses at a major university in China. The findings indicate that: (1) LLMs can be an effective tool for course evaluation; (2) their effectiveness is contingent upon appropriate fine-tuning and prompt engineering; and (3) LLM-generated evaluation results demonstrate a notable level of rationality and interpretability.

You Only Train Once: A highly generalizable reinforcement learning method for dynamic...

Yunhui Zeng

and 3 more

July 19, 2022

Research in artificial intelligence demonstrates the applicability and flexibility of the reinforcement learning (RL) technique for the dynamic job shop scheduling problem (DJSP). However, the RL-based method will always overfit to the training environment and cannot generalize well to novel unseen situations at deployment time, which is unacceptable in real-world production. For this reason, this paper proposes a highly generalizable reinforcement learning framework named Train Once For All (TOFA) for the dynamic job shop scheduling problem. The trivial and non-trivial states are distinguished when the DJSP is formulated as a semi-Markov decision process, defining the size-agnostic state, action, and reward function. A novel graph representation learning method based on attention mechanism and spatial pyramid pooling is implemented to compress the disjunctive graphs of differentsize DJSP into fixed-length feature vectors. Combining the proposed dynamic frame skipping and an improved prioritized experience replay method that considers the sample quality difference at different training phases. TOFA shows superb generalization capability, outperforms practically favored dispatching rules and even instance-by-instance training RL-based schedulers on various benchmark DJSP. Additionally, we proved that TOFA acquires a transferable scheduling policy that can be used to schedule a whole new DJSP without additional training.

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Divi...

Tiantian Zhang

and 3 more

August 09, 2021

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of “catastrophic interference” (a.k.a. “catastrophic forgetting”) and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of “context” into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.