Zhao Yang

I am a postdoc at VU Amsterdam working with Vincent François-Lavet. I received my PhD from Leiden University where I worked with Thomas Moerland, Mike Preuss, and Aske Plaat. I got my master degree at Leiden University in 2020, and bachelor degree at BLCU, China, in 2018.

I am interested in artificial intelligence and how it can support decision-making. To that end, I research reinforcement learning and its applications in both virtual environments (e.g. video games and robitics) and real-world domains (e.g. healthcare and LLMs).

I co-host BeNeRL seminar and review.

Contact: z.yang(at)liacs.leidenuniv.nl
Google Scholar | LinkedIn | Twitter | Github | CV

Publications

An agent that learns from cross-domain videos
Zhao Yang, Jacob E. Kooi, Thomas Delliaux, Vincent François-Lavet
In submission.
Website (for review)

Keywords: Cross-domain imitation learning, Video Prediction, RL

Hadamax Encoding: Elevating Performance in Model-Free Atari
Jacob E. Kooi, Zhao Yang, Vincent François-Lavet
In submission.
arxiv | Code

Keywords: Model-free RL, Atari

A skill discovery method guided by foundation models
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Vincent François-Lavet, Edward S. Hu
In submission.
Paper | Website (for review)

Keywords: Foundation Models, Unsupervised Skill Discovery, RL

Reset-free Reinforcement Learning with World Models
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu
Transactions on Machine Learning Research (TMLR) , 2025.
Paper | Code | Website | Reviews

Keywords: World Models, Reset-free, RL

Two-Memory Reinforcement Learning
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
Conference on Games (CoG), 2023; EWRL, 2023
Paper | Code

Combine episodic control (EC) and RL together. The agent learns to automatically switch between EC and RL.

Continuous Episodic Control
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
Conference on Games (CoG), 2023; EWRL, 2023
Paper | Code

Use episodic memory directly for continuous action selection. It outperforms SOTA RL agents.

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
International Conference on Agents and Artificial Intelligence (ICAART), 2023; ALOE workshop @ICLR, 2022
Paper

Systematically illustrate that why and how Go-Explore works in tabular and deep RL settings. Explore ('exp') can help the agent step into unseen areas.

Transfer Learning and Curriculum Learning in Sokoban
Zhao Yang, Mike Preuss, Aske Plaat
Benelux Conference on Artificial Intelligence (BNAIC), 2021
Paper | Code

Pre-train and fine-tune neural networks on Sokoban tasks. Agents pre-trained in 1-box tasks can learn faster in 2/3-box tasks, but not vice versa.

Template based on Hyunseung's website. Latest update: 05/2025.