On the Job Market!

I am finishing my Ph.D., seeking postdoc / research roles in Agentic / Embodied AI & LLM.😄

Xufeng Zhao

PhD in Artificial Intelligence,

University of Hamburg

👋 Hi! I I defended my PhD on 08.10.2025‼️ Committee: Prof. Stefan Wermter (supervisor), Prof. Jianwei Zhang (deputy), and Prof. Sören Laue (chair).

I study embodied agents that explore, learn, and adapt autonomously, unifying reinforcement learning, world modeling, and large language models so agents can reason, acquire skills, and scale with minimal supervision. My long-term goal is to deliver robust, trustworthy systems that assist people in complex real-world settings.

Education

Ph.D. in Computer Science, 2025
University of Hamburg
M.Sc. in Signal Processing, 2018
University of Chinese Academy of Sciences
B.E. in Electronic Information Engineering, 2014
Xidian University

Interests

Agentic Intelligence
Robotics / Embodied AI
Reinforcement Learning
Machine Learning

Selected Publications

See my Google Scholar for a comprehensive listing!

Joint Design of Protein Surface and Backbone Using a Diffusion Bridge Model

NeurIPS 2025

Guanlve Li , Xufeng Zhao , Fang Wu , Sören Laue

PepBridge jointly designs receptor-complementary protein surfaces and full 3D structures from a receptor’s point-cloud surface. It uses denoising diffusion bridges to generate ligand surfaces, a diffusion model to build backbone/side chains, and Shape-Frame Matching to align geometry for stability and chemical realism. Validated across diverse tasks, PepBridge produces diverse, physically plausible proteins.

PersRM-R1: Enhance Personalized Reward Modeling with Reinforcement Learning

preprint

Mengdi Li* , Guanqiao Chen* , Xufeng Zhao , Haochen Wen , Shu Yang , Di Wang

We propose PersRM-R1, a reasoning-based reward model that learns personal preferences from just a few examples. Using synthetic data and a two-stage training pipeline, it achieves high accuracy and generalization, outperforming models of similar size and rivaling much larger ones—paving the way for more personalized LLMs.

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

preprint

Mengdi Li* , Jiaye Lin* , Xufeng Zhao , Wenhao Lu , Peilin Zhao , Stefan Wermter , Di Wang

We propose Curriculum-RLAIF, a data-centric framework that improves reward model generalizability by training on preference pairs of increasing difficulty. This curriculum-based approach addresses data noise, distribution shift, and model-capacity mismatch. Experiments show that Curriculum-RLAIF significantly boosts policy alignment performance without extra inference cost, outperforming non-curriculum and alternative strategies.

Mental Modeling of Reinforcement Learning Agents by Language Models

TMLR 2025 18th EWRL

Wenhao Lu , Xufeng Zhao , Josua Spisak , Jae Hee Lee , Stefan Wermter

This study explores whether LLMs can mentally model decision-making agents by reasoning over their behavior and state transitions from interaction histories. Evaluated on reinforcement learning tasks, results show that while LLMs offer some insight, they fall short of fully modeling agents without further innovation, highlighting both their potential and current limitations for explainable RL.

REAL: Response Embedding-Based Alignment for LLMs

IJCAI 2025 Workshop (oral)

Honggen Zhang , Xufeng Zhao , Igor Molybog , June Zhang

We propose REAL (Response Embedding-based Alignment for LLMs), a method to improve alignment efficiency by selecting less ambiguous, dissimilar response pairs for annotation. By leveraging embedding similarity in an off-policy manner, REAL reduces label noise and improves alignment quality. Experiments show it boosts performance while cutting annotation effort by up to 65%.

LLM+MAP: Bimanual Robot Task Planning Using Large Language Models and Planning Domain Definition Language

preprint 2025

Kun Chu , Xufeng Zhao , Cornelius Weber , Stefan Wermter

LLM+MAP is a bimanual planning framework that combines GPT-4o with multi-agent task planning to enable efficient and logically consistent long-horizon manipulation. It outperforms baseline LLMs on planning time, success rate, and coordination metrics.

Agentic Skill Discovery

CoRL 2024 Workshop ICRA@40

Xufeng Zhao , Cornelius Weber , Stefan Wermter

We propose an LLM-driven framework that enables robots to autonomously discover useful skills from scratch. By generating tasks, rewards, and success criteria, the LLM guides reinforcement learning, while a vision-language model verifies outcomes. This allows the robot to build a meaningful skill library without relying on predefined primitives.

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

COLING 2024 (oral)

Xufeng Zhao , Mengdi Li , Wenhao Lu , Cornelius Weber , Jae Hee Lee , Kun Chu , Stefan Wermter

We propose LoT (Logical Thoughts), a framework that improves large language models’ reasoning at inference time by applying symbolic logic to verify and correct their step-by-step thought process. LoT enhances performance on diverse reasoning tasks and reduces hallucinations.

Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

ICANN 2024

Xiaowen Sun , Xufeng Zhao , Jae Hee Lee , Wenhao Lu , Matthias Kerzel , Stefan Wermter

We introduce OSSA (Object State-Sensitive Agent), a task-planning agent using pre-trained LLMs and VLMs to generate plans sensitive to object states. We compare two methods: a modular approach combining vision and language models, and a monolithic VLM approach. Evaluated on tabletop tasks involving clearing a table, OSSA’s monolithic model outperforms the modular one. A new multimodal benchmark dataset with object state annotations is provided.

Large Language Models for Orchestrating Bimanual Robots

Humanoids 2024

Kun Chu , Xufeng Zhao , Cornelius Weber , Mengdi Li , Wenhao Lu , Stefan Wermter

LABOR uses LLMs to orchestrate control policies for long-horizon bimanual manipulation tasks. By leveraging task reasoning and coordination via language, it achieves higher success rates on simulated tasks with the NICOL robot and provides insights into LLM-based control challenges.

Causal State Distillation for Explainable Reinforcement Learning

CLeaR 2024 (oral)

Wenhao Lu , Xufeng Zhao , Thilo Fryen , Jae Hee Lee , Mengdi Li , Sven Magg , Stefan Wermter

We propose reward decomposition methods for better decision-making explainality.

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

IROS 2023 (oral)

Xufeng Zhao , Mengdi Li , Cornelius Weber , Muhammad Burhan Hafez , Stefan Wermter

We present Matcha agent, an interactive perception framework that uses LLMs to guide robots in gathering multimodal sensory data (vision, sound, haptics, proprioception) before executing tasks. Matcha enables high-level reasoning and planning in partially observable environments, showing that LLMs can effectively control robot behavior when grounded with multimodal context.

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

CoRL 2023 Workshop (oral)

Kun Chu , Xufeng Zhao , Cornelius Weber , Mengdi Li , Stefan Wermter

Lafite-RL is a framework that leverages Large Language Models to provide natural language feedback for guiding reinforcement learning in robotic tasks. Tested on RLBench, it improves learning efficiency and success rates without requiring costly human supervision.

Internally Rewarded Reinforcement Learning

ICML 2023

Xufeng Zhao* , Mengdi Li* , Jae Hee Lee , Cornelius Weber , Stefan Wermter

We introduce Internally Rewarded Reinforcement Learning (IRRL), where rewards are generated by a jointly learned internal model rather than the environment. This coupling of policy and reward learning can destabilize training. We formalize IRRL, analyze its challenges, and propose a clipped linear reward function that reduces reward noise. Experiments show improved stability, faster convergence, and better performance across tasks.

A Closer Look at Reward Decomposition for High-Level Robotic Explanations

ICDL 2023 (oral)

Wenhao Lu , Xufeng Zhao , Sven Magg , Martin Gromniak , Mengdi Li , Stefan Wermter

Explainable Q-Map improves the transparency of RL agents by combining reward decomposition with abstract action spaces, enabling clear, high-level explanations based on task-relevant object properties. We demonstrate visual and textual explanations in robotic scenarios and show how they can be used with LLMs for reasoning and interactive querying.

Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations

IROS 2022 (oral)

Xufeng Zhao , Cornelius Weber , Muhammad Burhan Hafez , Stefan Wermter

We propose the Intrinsic Sound Curiosity Module (ISCM) to use sound as an informative modality for unsupervised reinforcement learning. In realistic manipulation scenarios with simulated audio, ISCM guides exploration and representation learning. Experiments show that sound-driven pre-training leads to better representations and faster adaptation than vision-only baselines.

Density Weighted Diversity based Query Strategy for Active Learning

CSCWD 2021

Tingting Wang , Xufeng Zhao , Qiujian Lv , Bo Hu , Degang Sun

DWDS is a density-weighted diversity strategy for active learning in deep learning. It selects informative and representative samples using geometric insights and a beam search for efficient query selection. DWDS consistently outperforms existing methods under limited labeling budgets.

Presentations

Activities & Talks

LoT paper presentation @ LREC-COLING 2024

Introduced trending and advancement of agentic intelligence, including the paradigm from reinforcement learning (RL) to large language …

Nov 21, 2030

RL + LLM Agents @ 如身机器人科技

Introduced trending and advancement of agentic intelligence, including the paradigm from reinforcement learning (RL) to large language …

Nov 21, 2030

Aug 22, 2023

Present "Internally Rewarded Reinforcement Learning" in ICML 2023, Hawaii 🤙

Jul 24, 2023

I am glad to present our sound + unsupervised reinforcement learning (URL) paper in <a href="https://iros2022.org/">IROS 2022</a>, Kyoto.

Oct 23, 2022

See all

Experience

Education & Work

Ph.D. in Computer Science

Univerisity of Hamburg

Mar 2021 - Present Hamburg, Germany

Doctoral advisor: Prof. Stefan Wermter
Ph.D. Thesis: “Environment Exploration and Autonomous Adaptation in Embodied Agents” (under review)

AI Engineer

JD AI Core Department

Jul 2018 - Jun 2020 Beijing, China

I developed core machine learning algorithms across several domains, including

Sales prediction and personalized recommendation (time series modeling)
Image quality assessment, aesthetic ranking; drug identification in medical cabinets; and object detection for unmanned cargo machines (computer vision)
Toxic dialogue detection (natural language processing)

M.Sc. in Signal Processing

University of Chinese Academy of Sciences

Sep 2015 - Jun 2018 Beijing, China

Advisor: Prof. Daojing Li
Conducted research on machine learning and advanced signal processing methods, applied to radar object detection and multipath clutter suppression.

Signal Processing Engineer

ExtantFuture

Jan 2015 - Jun 2015 Beijing, China

Developed signal denoising and core processing algorithms for a wearable medical device to monitor fetal health indicators, including fetal heart rate and movement, as well as maternal activity tracking.

B.E. in Electronic Information Engineering

Xidian University

Sep 2010 - Jun 2014 Xi'an, China

Studied digital circuit design, analog circuit design, and microcontroller-based control systems. Participated in electronic design competitions and mathematical modeling contests.

Awards & Grants

Selected recognitions and funding

ICRA@40 Travel Grants

IEEE Robotics and Automation Society 2024

Researcher Access Program

OpenAI 2024

Academic Scholarship

University of Chinese Academy of Sciences 2015 – 2018

Excellent Student Cadre & Triple-A Student

University of Chinese Academy of Sciences 2015

The Second Prize of China Undergraduate Mathematical Contest in Modeling

China Society for Industrial and Applied Mathematics 2013

The Second Prize of Xinghuo Electronic Competition

Xidian University 2012

The First Prize of Xidian’s Mathematical Contest in Modeling