潘玲
香港科技大学
潘玲,将于2024年春季入职香港科技大学电子与计算机工程系担任助理教授。目前在MILA担任博士后研究员,由Yoshua Bengio教授指导。她于2022年从清华大学跨学科信息科学研究院(由Andrew Yao教授领导)获得博士学位,导师为黄隆波教授。她的研究主要集中在发展生成流网络(GFlowNets;Bengio等人,2021)、强化学习和多智能体系统的算法基础和实际应用上。她致力于开发稳健、高效和实用的深度强化学习算法。在博士期间,她曾访问斯坦福大学与Tengyu Ma教授合作,牛津大学与Shimon Whiteson教授合作,以及微软亚洲研究院的机器学习组与Wei Chen博士合作。她曾获得微软亚洲研究院奖学金(2020年)。
报告主题: Towards Robust, Efficient and Practical Decision Making: From Reward-Maximizing Deep Reinforcement Learning to Reward-Matching GFlowNets
Recent years have witnessed the great success of RL with deep feature representations in many challenging tasks, including computer games, robotics, smart city, and so on. Yet, solely focusing on the optimal solution based on a reward proxy and learning the reward-maximizing policy is not enough. Diversity of the generated states is desirable in a wide range of important practical scenarios such as drug discovery, recommender systems, dialogue systems, etc. For example, in molecule generation, the reward function used in in-silico simulations can be uncertain and imperfect itself (compared to the more expensive in-vivo experiments). Therefore, it is not sufficient to only search for the solution that maximizes the return. Instead, it is desired that we sample many high-reward candidates, which can be achieved by sampling them proportionally to the reward of each terminal state. The Generative Flow Network (GFlowNet) is a probabilistic framework proposed by Yoshua Bengio in 2021 where an agent learns a stochastic policy for object generation, such that the probability of generating an object is proportional to a given reward function, i.e., by learning a reward-matching policy. Its effectiveness has been shown in discovering high-quality and diverse solutions in molecule generation, biological sequence design, etc. The talk concerns my recent research works about how we tackle three important challenges in such decision-making systems. Firstly, how can we ensure a robust learning behavior and value estimation of the agent? Secondly, how can we improve its learning efficiency? Thirdly, how to successfully apply them in important practical applications such as computational sustainability problems and drug discovery?