Robotics

Mental Modeling of Reinforcement Learning Agents by Language Models

This study explores whether LLMs can mentally model decision-making agents by reasoning over their behavior and state transitions from interaction histories. Evaluated on reinforcement learning tasks, results show that while LLMs offer some insight, they fall short of fully modeling agents without further innovation, highlighting both their potential and current limitations for explainable RL.

Agentic Skill Discovery

We propose an LLM-driven framework that enables **robots to autonomously discover useful skills from scratch**. By generating tasks, rewards, and success criteria, the LLM guides reinforcement learning, while a vision-language model verifies outcomes. This allows the robot to build a meaningful skill library without relying on predefined primitives.

Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

We introduce OSSA (Object State-Sensitive Agent), a task-planning agent using pre-trained LLMs and VLMs to generate plans sensitive to object states. We compare two methods: a modular approach combining vision and language models, and a monolithic VLM approach. Evaluated on tabletop tasks involving clearing a table, OSSA’s monolithic model outperforms the modular one. A new multimodal benchmark dataset with object state annotations is provided.

Large Language Models for Orchestrating Bimanual Robots

LABOR uses LLMs to orchestrate control policies for long-horizon bimanual manipulation tasks. By leveraging task reasoning and coordination via language, it achieves higher success rates on simulated tasks with the NICOL robot and provides insights into LLM-based control challenges.

Causal State Distillation for Explainable Reinforcement Learning

We propose reward decomposition methods for better decision-making explainality.

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

We present Matcha agent, an interactive perception framework that uses LLMs to guide robots in gathering multimodal sensory data (vision, sound, haptics, proprioception) before executing tasks. Matcha enables high-level reasoning and planning in partially observable environments, showing that LLMs can effectively control robot behavior when grounded with multimodal context.

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Lafite-RL is a framework that leverages Large Language Models to provide natural language feedback for guiding reinforcement learning in robotic tasks. Tested on RLBench, it improves learning efficiency and success rates without requiring costly human supervision.

Internally Rewarded Reinforcement Learning

We introduce Internally Rewarded Reinforcement Learning (IRRL), where rewards are generated by a jointly learned internal model rather than the environment. This coupling of policy and reward learning can destabilize training. We formalize IRRL, analyze its challenges, and propose a clipped linear reward function that reduces reward noise. Experiments show improved stability, faster convergence, and better performance across tasks.

A Closer Look at Reward Decomposition for High-Level Robotic Explanations

Explainable Q-Map improves the transparency of RL agents by combining reward decomposition with abstract action spaces, enabling clear, high-level explanations based on task-relevant object properties. We demonstrate visual and textual explanations in robotic scenarios and show how they can be used with LLMs for reasoning and interactive querying.

Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations

We propose the Intrinsic Sound Curiosity Module (ISCM) to use sound as an informative modality for unsupervised reinforcement learning. In realistic manipulation scenarios with simulated audio, ISCM guides exploration and representation learning. Experiments show that sound-driven pre-training leads to better representations and faster adaptation than vision-only baselines.