I am a postdoc at VU Amsterdam working with Vincent François-Lavet. I received my PhD from Leiden University where I worked with Thomas Moerland, Mike Preuss, and Aske Plaat. I got my master degree at Leiden University in 2020, and bachelor degree at BLCU, China, in 2018.
I am interested in artificial intelligence and how it can support decision-making. To that end, I research reinforcement learning and its applications to real-world problems, including healthcare and LLMs.
I co-host BeNeRL seminar and review.
Contact: z.yang(at)liacs.leidenuniv.nl
Google Scholar  | 
LinkedIn  | 
Twitter  | 
Github  | 
CV
Keywords: Model-free RL, Atari
Keywords: Foundation Models, Unsupervised Skill Discovery, RL
Keywords: World Models, Reset-free, RL
Combine episodic control (EC) and RL together. The agent learns to automatically switch between EC and RL.
Use episodic memory directly for continuous action selection. It outperforms SOTA RL agents.
Systematically illustrate that why and how Go-Explore works in tabular and deep RL settings. Explore ('exp') can help the agent step into unseen areas.
Pre-train and fine-tune neural networks on Sokoban tasks. Agents pre-trained in 1-box tasks can learn faster in 2/3-box tasks, but not vice versa.
Template based on Hyunseung's website. Latest update: 05/2025.