Jasmine
这家伙很懒,什么都没留下
发布于

How can model-based RL do optimization?

评论(1)
  • YanSong
    YanSong 回复

    We can view a predictive model analogously as a learned method of generating synthetic data, and training a model with the given data from the environment can be seen as moving the data generating distribution towards the real distribution. And we hope the model can generate to unseen scenario which will provide more valuable data for policy optimisation.