Zhao Yang

I am a postdoc at VU Amsterdam working with Vincent François-Lavet. I received my PhD from Leiden University where I worked with Thomas Moerland, Mike Preuss, and Aske Plaat. I got my master degree at Leiden University in 2020, and bachelor degree at BLCU, China, in 2018.

I'm interested in reinforcement learning and trying to automate agents using [intrinsic motivation, foundation models, world models...], mostly in games and robotic tasks.

I co-host BeNeRL seminar and review.

Contact: z.yang(at)liacs.leidenuniv.nl
Google Scholar  |  LinkedIn  |  Twitter  |  Github  |  CV

profile photo

Publications


A Skill Discovery Method Guided by Foundation Models
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu
Preprint, in submission.
Paper | Website (for review)

Keywords: Foundation Models, Unsupervised Skill Discovery


World Models Increase Autonomy in Reinforcement Learning
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu
Preprint, in submission.
Paper | Website | Website (for review)

Keywords: World Models, Autonomy, Unsupervised RL


Two-Memory Reinforcement Learning
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
COG, 2023; EWRL, 2023
Paper | Code

Combine episodic control (EC) and RL together. The agent learns to automatically switch between EC and RL.


Continuous Episodic Control
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
COG, 2023; EWRL, 2023
Paper | Code

Use episodic memory directly for continuous action selection. It outperforms SOTA RL agents.


First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
ICAART, 2023; ALOE workshop @ICLR, 2022
Paper

Systematically illustrate that why and how Go-Explore works in tabular and deep RL settings. Explore ('exp') can help the agent step into unseen areas.


Transfer Learning and Curriculum Learning in Sokoban
Zhao Yang, Mike Preuss, Aske Plaat
BNAIC, 2021
Paper | Code

Pre-train and fine-tune neural networks on Sokoban tasks. Agents pre-trained in 1-box tasks can learn faster in 2/3-box tasks, but not vice versa.

Template based on Hyunseung's website. Latest update: 08/2024.