Learning in sequential decision making problems can be significantly affected by the choice of reward function. Although crucial for learning success, reward function design remains a complex task, requiring expertise and not always aligned with human preferences. Large Language Models (LLMs) offer a promising avenue for reward design through textual prompts that leverage the prior knowledge and contextual reasoning of LLMs. Despite their potential, LLM responses lack guarantees and their reasoning abilities are poorly understood. To mitigate potential LLM errors, we introduce an alternative approach: learning a new reward function utilizing LLM outputs as auxiliary rewards. This problem is tackled through a bi-level optimization framework, showcasing the method’s proficiency in both optimal reward acquisition and adaptive reward shaping. The proposed approach demonstrates robustness and effectiveness, offering a novel strategy for enhancing reinforcement learning outcomes.     Â