Zhao Yang
I am a PhD student at the Reinforcement Learning Group, Leiden University, supervised by Thomas Moerland, Mike Preuss, and Aske Plaat. I got my master degree at Leiden University in 2020, and bachelor degree at BLCU, China, in 2018.
I'm interested in reinforcement learning and trying to automate agents using [intrinsic motivation, world models...], mostly in games and robotic tasks.
I co-host BeNeRL seminar, assist courses DRL, SADRL, VG4R, and review.
Contact: z.yang(at)liacs.leidenuniv.nl
Google Scholar  | 
LinkedIn  | 
Twitter  | 
Github  | 
CV
I'm actively looking for internships / jobs / postdocs!
|
|
|
An Autonomous RL Agent
In Submission
Keywords: World Models, Autonomy, Unsupervised RL
Website
|
|
Two-Memory Reinforcement Learning
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
COG, 2023; EWRL, 2023
Paper | Code
Combine episodic control (EC) and RL together. The agent learns to automatically switch between EC and RL.
|
|
Continuous Episodic Control
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
COG, 2023; EWRL, 2023
Paper | Code
Use episodic memory directly for continuous action selection. It outperforms SOTA RL agents.
|
|
First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
ICAART, 2023; ALOE workshop @ICLR, 2022
Paper
Systematically illustrate that why and how Go-Explore works in tabular and deep RL settings. Explore ('exp') can help the agent step into unseen areas.
|
|
Transfer Learning and Curriculum Learning in Sokoban
Zhao Yang, Mike Preuss, Aske Plaat
BNAIC, 2021
Paper | Code
Pre-train and fine-tune neural networks on Sokoban tasks. Agents pre-trained in 1-box tasks can learn faster in 2/3-box tasks, but not vice versa.
|
|