👨‍🎓 About Me

I am Pony, currently serving as the Chief Research Officer (CRO) at Macaron AI. I lead Mind Lab, a research laboratory focused on Experiential Intelligence — the study and engineering of AI systems that improve primarily through real-world experience rather than just optimizing fixed benchmarks.

Prior to this role, I was a postdoctoral researcher in the Department of Automation at Tsinghua University, working closely with the MIG lab led by Prof. Chongjie Zhang (currently with McKelvey School of Engineering, Washington University in St. Louis). I obtained my Ph.D. degree in June, 2023. During my doctoral studies, I was a member of CFINS and was supervised by Prof. Qianchuan Zhao and Prof. Li Xia (currently with the Business School at Sun Yat-Sen University). I completed my Bachelor of Engineering degree from the Department of Automation at Xi’an Jiaotong University in 2017.

Mind Lab

At Mind Lab, we focus on building AI agents that learn and grow from real-world interactions. To achieve this, we engage in research-product co-design and develop online lifelong learning systems, including algorithms, infrastructure, and evaluation frameworks. Our work aims to bridge the gap between what models know and how they grow, moving from static “brains” to adaptive “minds” that continuously improve through experience.

We are interested in collaborating with researchers, engineers, and practitioners who share our vision. We welcome research scientists and interns to join our team. If you are interested in our work, please feel free to contact me or reach out to contact@mindlab.ltd.

📰 News

December 2025: We launch Mind Lab 🎉, a research lab focused on Experiential Intelligence — building AI systems that improve from real-world experience rather than just optimizing fixed benchmarks, moving from static “brains” to adaptive “minds”. Learn more: Introducing Mind Lab: Building AI that Learns from Real Experience
December 2025: We present the first end-to-end LoRA-RL on trillion-parameter reasoning models, achieving 10% GPU cost compared to full-parameter RL while maintaining performance. Learn more: How We Build Trillion Parameter Reasoning RL with 10% GPUs
December 2025: We explore how agentic memory systems can go beyond simple retrieval, enabling agents to maintain and update their understanding of the world through experience. Learn more: Exploring Agentic Memory beyond Reasoning and Tool-Use

📝 Publications

Conference Paper

Episodic Novelty Through Temporal Distance. Yuhua Jiang*, Qihan Liu*, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo Xu, Chongjie Zhang, Qianchuan Zhao. International Conference on Learning Representations (ICLR), 2025.
Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint. Jiafei Lyu, Mengbei Yan, Zhongjian Qiao, Runze Liu, Xiaoteng Ma, Deheng Ye, Jing-Wen Yang, Zongqing Lu, Xiu Li. International Conference on Learning Representations (ICLR), 2025.
NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement Learning. Chuanyi Xue*, Qihan Liu*, Xiaoteng Ma*, Xinyao Qin, Ning Gui, Yang Qi, Jinsheng Ren, Bin Liang, Jun Yang. Advances in Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS), 2024.
Single-Trajectory Distributionally Robust Reinforcement Learning. Zhipeng Liang*, Xiaoteng Ma*, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou. International Conference on Machine Learning (ICML), 2024.
Efficient Multi-agent Reinforcement Learning by Planning. Qihan Liu*, Jianing Ye*, Xiaoteng Ma*, Jun Yang, Bin Liang, Chongjie Zhang. International Conference on Learning Representations (ICLR), 2024.
SEABO: A Simple Search-Based Method for Offline Imitation Learning. Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu. International Conference on Learning Representations (ICLR), 2024.
Learning Diverse Risk Preferences In Population-based Self-play. Yuhua Jiang*, Qihan Liu*, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao. AAAI Conference on Artificial Intelligence, (AAAI), 2024. (Oral)
Cross-Domain Policy Adaptation via Value-Guided Data Filtering. Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang, Xuelong Li, Wei Li. Advances in Neural Information Processing Systems (NeurIPS), 2023.
Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning. Junjie Zhang*, Jiafei Lyu*, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li. European Conference on Artificial Intelligence (ECAI), 2023.
What Is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang. International Conference on Machine Learning (ICML), 2023.
Mildly Conservative Q-Learning for Offline Reinforcement Learning. Jiafei Lyu*, Xiaoteng Ma*, Xiu Li, Zongqing Lu. Advances in Neural Information Processing Systems (NeurIPS), 2022. (Spotlight)
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing. Rui Yang*, Chenjia Bai*, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han. Advances in Neural Information Processing Systems (NeurIPS), 2022. (Spotlight)
Exploiting Reward Shifting in Value-Based Deep RL. Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou. Advances in Neural Information Processing Systems (NeurIPS), 2022.
Offline Reinforcement Learning with Value-based Episodic Memory. Xiaoteng Ma*, Yiqin Yang*, Hao Hu*, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang. International Conference on Learning Representations (ICLR), 2022.
Efficient Continuous Control with Double Actors and Regularized Critics. Jiafei Lyu*, Xiaoteng Ma*, Jiangpeng Yan, Xiu Li. AAAI Conference on Artificial Intelligence, (AAAI), 2022. (Oral)
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning. Yiqin Yang*, Xiaoteng Ma*, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang and Qianchuan Zhao. Advances in Neural Information Processing Systems (NeurIPS), 2021. (Spotlight)
Average-Reward Reinforcement Learning with Trust Region Methods. Xiaoteng Ma, Xiaohang Tang, Jun Yang, Li Xia, Qianchuan Zhao. International Joint Conference on Artificial Intelligence (IJCAI), 2021.
Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning. Xiaoteng Ma*, Yiqin Yang*, Chenghao Li*, Yiwen Lu, Qianchuan Zhao and Jun Yang. International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2021.
Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration. Ming Zhang, Yawei Wang, Xiaoteng Ma, Li Xia, Jun Yang, Zhiheng Li and Xiu Li. IEEE Data Driven Control and Learning Systems Conference (DDCLS), 2020.
Fairness Control of Traffic Light via Deep Reinforcement Learning. Chenghao Li, Xiaoteng Ma, Li Xia, Qianchuan Zhao and Jun Yang. IEEE International Conference on Automation Science and Engineering (CASE), 2020.
Bi-level Proximal Policy optimization for Stochastic Coordination of EV Charging Load with Uncertain Wind Power. Teng Long, Xiaoteng Ma, Qing-Shan Jia. IEEE Conference on Control Technology and Applications (CCTA), 2019.
Attendance and security system based on building video surveillance. Kailai Sun, Qianchuan Zhao, Jianhong Zou, Xiaoteng Ma. International Conference on Smart City and Intelligent Building (ICSCIB), 2018.

Journal Paper

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning. Xiaoteng Ma*, Junyao Chen*, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou. Journal of Artificial Intelligence Research (JAIR), 2025.
CVaR-Constrained Policy Optimization for Safe Reinforcement Learning. Qiyuan Zhang, Shu Leng, Xiaoteng Ma, Qihan Liu, Xueqian Wang, Bin Liang, Yu Liu, Jun Yang. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024.
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes. Shuai Ma, Xiaoteng Ma, Li Xia. European Journal of Operational Research (EJOR), 2023.
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning. Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao. Journal of Artificial Intelligence Research (JAIR), 2022.
MPSN: Motion-aware Pseudo-Siamese Network for Indoor Video Head Detection in Buildings. Kailai Sun*, Xiaoteng Ma*, Peng Liu*, Qianchuan Zhao. Building and Environment (BAE), 2022.
An optimistic value iteration for mean–variance optimization in discounted Markov decision processes. Shuai Ma, Xiaoteng Ma, Li Xia.Results in Control and Optimization (RICO), 2021.
Learning to Discover Task-Relevant Features for Interpretable Reinforcement Learning. Qiyuan Zhang, Xiaoteng Ma, Yiqin Yang, Chenghao Li, Jun Yang, Yu Liu and Bin Liang. IEEE Robotics and Automation Letters (RA-L), 2021.
Reinforcement learning for fluctuation reduction of wind power with energy storage. Zhen Yang, Xiaoteng Ma, Li Xia, Qianchuan Zhao and Xiaohong Guan. Results in Control and Optimization (RICO), 2021.

Preprint

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation. Xiaoteng Ma*, Zhipeng Liang*, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou.

👨🏽‍🤝‍👨🏼 Collaborators

Qianchuan Zhao - Professor, Department of Automation, Tsinghua University.
Li Xia - Professor, Business School, Sun Yat-Sen University
Zhengyuan Zhou - Assistant Professor, Stern School of Business, New York University
Gao Huang - Assistant Professor, Department of Automation, Tsinghua University
Chongjie Zhang - Assistant Professor, Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Zhipeng Liang - Ph.D. Student, Department of Industrial Engineering and Decision Analytics (IEDA), Hong Kong University of Science and Technology.
Rui Yang - Ph.D. Student, Department of Computer Science and Engineering (CSE), Hong Kong University of Science and Technology.
Jiafei Lyu - Ph.D. Student, Tsinghua Shenzhen International Graduate School, Tsinghua University.

🥇 Honors and Awards

2022.10 Tsinghua Comprehensive Scholarship
2021.10 Tsinghua Comprehensive Scholarship
2015.10 Xi’an Jiaotong University Outstanding Student (Undergraduate) (Top 10)
2015.10 National Scholarship (Undergraduate) (Top 1%)
2014.10 National Scholarship (Undergraduate) (Top 1%)

📖 Educations

2017.09 - 2023.06, Ph.D., Department of Automation, Tsinghua University.
2013.09 - 2017.06, Bachelor, Department of Automation, Xi’an Jiaotong Univeristy.

💻 Internships

2018.09 - 2018.11, SenseTime, Beijing.
2017.07 - 2017.08, Institute of Automation, CAS, Beijing.

📞 Contact

Xiaoteng Ma

Department of Automation
Tsinghua University
FIT Building 1-109
Beijing, China, 100084
E-mail: pony[DOT]xtma[AT]gmail[DOT]com