How can model-based RL do optimization?

YanSong 2021-10-12 21:15:42 回复

We can view a predictive model analogously as a learned method of generating synthetic data, and training a model with the given data from the environment can be seen as moving the data generating distribution towards the real distribution. And we hope the model can generate to unseen scenario which will provide more valuable data for policy optimisation.