RLChina 2020 强化学习夏令营

课程安排

课程内容：从智能感知转变到智能决策，是人工智能发展进程的关键一步。本次在线课程对强化学习与智能科学这一前沿领域进行全面的介绍，从严谨的数学推导，到最新的研究结果和理论。

面向对象：对智能决策感兴趣的本科生、硕博研究生和行业从业人员；了解机器学习的相关知识；熟悉统计学和概率论基础。

课程设置：从 7 月 27 日开始，周一至周六，连续两周每晚 7 点到 8 点 40，第一节 40 分钟课程后休息 5 分钟，第二节 40 分钟课程后答疑 15 分钟。

B 站直播链接：https://live.bilibili.com/22386217

AI 研习社直播链接：http://www.test.yanxishe.com/events/rlchina2020

伯禹平台录播和学习讨论链接：https://www.boyuai.com/elites/course/78eQw4BeCzLos12d

Date & Time	Course	Teacher
2020-07-27 19:00-19:10	Openning and Introduction (课件,回放)	汪军
2020-07-27 19:10-20:50	Introduction to Reinforcement Learning and Value-based Methods (课件,回放)	卢宗青
2020-07-28 19:00-20:40	Foundations of Reinforcement Learning (课件,回放)	汪军
2020-07-29 19:00-20:40	Non-Convex Optimisation: Survey & ADAM's Proof (课件,回放)	Haitham
2020-07-30 19:00-20:40	Model-based Reinforcement Learning (课件,回放)	张伟楠
2020-07-31 19:00-20:40	Control as Inference (课件,回放)	朱占星
2020-08-01 19:00-20:40	Imitation Learning (课件,回放)	俞扬
2020-08-03 19:00-20:40	Learning with Sparse Rewards (课件,回放)	郝建业
2020-08-04 19:00-20:40	Game Theory Basics (课件,回放)	张海峰
2020-08-05 19:00-20:40	Multi-agent Systems (课件,回放)	安波
2020-08-06 19:00-20:40	Deep Multi-agent Reinforcement Learning(课件,回放)	张崇洁
2020-08-07 19:00-20:40	Advances of Multi-agent Learning(in Gaming AI) (课件,回放)	杨耀东
2020-08-08 19:00-20:40	Mean-field Games and Controls (课件,回放)	徐任远
2020-08-08 20:40-21:10	Panel Discussion (回放)	全体导师

教师介绍

（按姓名字母排序）

安波

Description
新加坡南洋理工大学校长委员会讲席副教授，于 2011 年在美国麻省大学 Amherst 分校获计算机科学博士学位。主要研究领域包括人工智能、多智能体系统、算法博弈论、强化学习、及优化。有 100 余篇论文发表在人工智能领域的国际顶级会议 AAMAS、IJCAI、AAAI、ICAPS、KDD、UAI、EC、WWW、ICLR、NeurIPS、ICML 以及著名学术期刊 JAAMAS 和 AIJ。曾获 2010 年 IFAAMAS 杰出博士论文奖、 2011 年美国海岸警卫队的卓越运营奖、2012 年 AAMAS 最佳应用论文奖、2016 年 IAAI 创新应用论文奖，2012 年 INFORMS Daniel H. Wagner 杰出运筹学应用奖，以及 2018 年南洋青年研究奖等荣誉。受邀在 IJCAI'17 上做 Early Career Spotlight talk。获得 2017 年微软合作 AI 挑战赛的冠军。入选 2018 年度 IEEE Intelligent Systems 的“人工智能 10 大新星”(AI's 10 to Watch)。他是 JAIR 编委会成员以及 JAAMAS, IEEE Intelligent Systems, 和 ACM TIST 的副主编。他是 AAMAS'20 的程序委员会主席。

Haitham Bou Ammar

Description
Haitham Bou Ammar leads the reinforcement learning at Hsuawei R&D UK and he also serves as an honorary lecturer at University College London. Prior to joining Huawei, Haitham led the reinforcement learning at PROWLER.io. Previously, Haitham held academic positions as a professor at the American University of Beirut, and a post-doctoral researcher at Princeton University and the University of Pennsylvania. Haitham’s research span various areas in machine learning including reinforcement learning, multi-task learning, optimisation, and Variational Inference.

郝建业

Description

华为诺亚方舟决策推理实验室主任，天津大学智能与计算学部副教授，博士生导师。主要研究方向为深度强化学习，多智能体系统。在人工智能领域知名国际会议及期刊发表学术论文 100 余篇，专著 2 部。主持参与国家基金委、科技部、天津市人工智能重大等科研项目 10 余项，研究成果荣获 ASE2019、DAI2019 最佳论文奖等，同时在游戏 AI、广告推荐、自动驾驶、优化控制等领域落地应用。点击查看个人主页

卢宗青

Description

北京大学计算机科学系“博雅”助理教授。在 2017 年 9 月加入北京大学之前，他在美国宾夕法尼亚州立大学计算机系从事博士后工作。他于 2014 年 4 月获得了新加坡南洋理工大学计算机博士学位，并获得了东南大学的硕士学位和学士学位。他的主要研究领域包括（多智能体）强化学习、移动/边缘智能系统等。点击查看个人主页

汪军

Description

伦敦大学学院（UCL）计算机系教授，阿兰·图灵研究所 Turing Fellow，华为诺亚方舟实验室决策推理首席顾问。主要研究智能信息系统，主要包括机器学习、强化学习、多智能体，数据挖掘、计算广告学、推荐系统、等等。已发表了 120 多篇学术论文，出版两本学术专著，多次获得最佳论文奖。点击查看个人主页

徐任远

Description

现任牛津大学（University of Oxford）数学系 Hooke Research Fellow，即将于 2021 年加入南加州大学（USC）工业系统工程系担任助理教授。其主要研究方向为应用概率，随机分析，博弈论与机器学习的交叉领域。徐任远本科毕业于中国科学技术大学数学学院（2014），并于加州大学伯克利分校（UC Berkeley）工业工程系获得博士学位（2019）。点击查看个人主页

杨耀东

Description

机器学习研究员，专注于强化学习，多智能体学习，和贝叶斯统计。目前担任华为诺亚方舟实验室多智能体学习技术专家，负责开展多智能体强化学习研究及其在自动驾驶决策中的应用。加入华为之前，他曾担任美国国际集团（AIG）科学组高级经理，带领开发机器学习在金融问题中的应用。杨耀东本科毕业于中国科技大学，硕士毕业于帝国理工大学，博士学习就读于 UCL, 目前发表各类学术论文 20 余篇。2018 年，他被英国内政部（Home Office）纳入人工智能杰出人才计划。点击查看个人主页

俞扬

Description

博士，南京大学教授，国家万人计划青年拔尖人才。主要研究领域为机器学习、强化学习。获 2013 年全国优秀博士学位论文奖、2011 年 CCF 优秀博士学位论文奖。发表论文 40 余篇，包括多篇 Artificial Intelligence、IJCAI、AAAI、NIPS、KDD 等，获得 4 项国际论文奖励和 2 项国际算法竞赛冠军，入选 2018 年 IEEE Intelligent Systems 杂志评选的“国际人工智能 10 大新星”，获 2018 亚太数据挖掘"青年成就奖”，受邀在 IJCAI’18 作关于强化学习的"青年亮点"报告。点击查看个人主页

张崇洁

Description

清华大学交叉信息科学院助理教授，博士生导师，机器智能研究组主任。于 2011 年在美国麻省大学阿默斯特分校获计算机科学博士学位，而后在美国麻省理工学院从事博士后研究。目前的研究专注于人工智能、深度强化学习、多智能体系统、以及机器人学。点击查看个人主页

张海峰

Description

中国科学院自动化研究所副研究员、硕士生导师，领导群体决策智能团队，研究领域包括多智能体强化学习、游戏 AI 和计算广告等。曾担任北京大学前沿计算研究中心访问学者和伦敦大学学院（UCL）博士后，并分别于 2018 年和 2012 年在北京大学获得计算机博士学位和计算机、经济学双学士学位。点击查看个人主页

张伟楠

Description

上海交通大学电院 John 中心长聘教轨副教授，研究强化学习领域中的多智能体强化学习、基于模型的强化学习和模仿学习等方向，并致力于将强化学习技术落地到互联网个性化服务、游戏智能、智慧交通、文本生成等应用场景中。张伟楠于 2011 年在上海交通大学计算机系 ACM 班获得学士学位，于 2016 年在伦敦大学学院计算机系获得博士学位。点击查看个人主页

朱占星

Description

北京大学数学科学学院、大数据科学研究中心助理教授，与北京大学深度学习实验室密切合作。此前，他从英国爱丁堡大学信息学院获得机器学习博士学位。他的研究领域涵盖机器学习和人工智能的方法论/理论及其在各个领域的应用。点击查看个人主页

课程内容

1、汪军： Openning and Introduction

2、卢宗青： Introduction to Reinforcement Learning and Value-based Methods

Introduction to Reinforcement Learning
Value-based Methods

3、汪军： Foundations of Reinforcement Learning

Recap (yesterday’s lecture)
Policy approaches
Computational learning theory
Theoretical analysis

4、Haitham: Foundations of Reinforcement Learning

Motivation, Functions and Solution Types
Brief Survey and ADAM Optimiser
ADAM’s Proof from NeurIPS 2018
Assumptions
Loss Function Difference Bound and Stationary Point Convergence

5、张伟楠： Model-based Reinforcement Learning

Introduction to MBRL from Dyna
Shooting methods: RS, PETS, POPLINr
Theoretic bounds and methods: SLBO, MBPO & BMPO
Backpropagation through paths: SVG and MAAC

6、朱占星： Control as Inference

Basic of (probabilistic) graphical models (GM)
Connection between RL and inference in GM
Maximum entropy RL and variational inference
Soft Q-Learning
Soft Actor-Critic

7、俞扬： Imitation Learning

Previously
Supervised Learning & Behavior Cloning
Generative Adversarial Learning & GAIL
Advanced Topics
From Imitating Policies to Imitating Environments

8、郝建业： Learning with Sparse Rewards

From Sparse to Dense
Task hierarchical decomposition (hierarchical RL)

9、张海峰： Game Theory Basics

Motivation and Normal-form Game
Extensive-form Game and Imperfect Information
Bayesian Game and Incomplete Information
Nash Equilibrium and Variants
Theoretical Results of Nash Equilibrium
Repeated Game and Learning Methods
Alternate Solution Concepts and Evolutionary Game Theory

10、安波： Multi-agent Systems

History and Current Status
Key research areas in MAS
Recent advances

11、张崇洁： Deep Multi-agent Reinforcement Learning

Value-Based Methods
Policy Gradient Methods

12、杨耀东： Advances of Multi-agent Learning(in Gaming AI)

Multi-agent Learning for Games
Policy Evaluation in Meta-games
Policy Improvement in Meta-games

13、徐任远： Mean-field Games and Controls

General Mean-Field Games (GMFG)
GMFG with RL
Learning Mean-Field Controls
Q-learning Algorithm for MFC

14、全体导师： Panel Discussion

联系我们

Email: rlchinacamp@163.com

Description

RLChina 2020强化学习夏令营

RLChina 2020 强化学习夏令营

课程安排

教师介绍

安波

Haitham Bou Ammar

郝建业

卢宗青

汪军

徐任远

杨耀东

俞扬

张崇洁

张海峰

张伟楠

朱占星

课程内容

1、 汪军： Openning and Introduction

2、 卢宗青： Introduction to Reinforcement Learning and Value-based Methods

Introduction to Reinforcement Learning

Value-based Methods

3、汪军： Foundations of Reinforcement Learning

Recap (yesterday’s lecture)

Policy approaches

Computational learning theory

Theoretical analysis

4、Haitham: Foundations of Reinforcement Learning

Motivation, Functions and Solution Types

Brief Survey and ADAM Optimiser

ADAM’s Proof from NeurIPS 2018

Assumptions

Loss Function Difference Bound and Stationary Point Convergence

5、张伟楠： Model-based Reinforcement Learning

Introduction to MBRL from Dyna

Shooting methods: RS, PETS, POPLINr

Theoretic bounds and methods: SLBO, MBPO & BMPO

Backpropagation through paths: SVG and MAAC

6、朱占星： Control as Inference

Basic of (probabilistic) graphical models (GM)

Connection between RL and inference in GM

Maximum entropy RL and variational inference

Soft Q-Learning

Soft Actor-Critic

7、俞扬： Imitation Learning

Previously

Supervised Learning & Behavior Cloning

Generative Adversarial Learning & GAIL

Advanced Topics

From Imitating Policies to Imitating Environments

8、郝建业： Learning with Sparse Rewards

From Sparse to Dense

Task hierarchical decomposition (hierarchical RL)

9、张海峰： Game Theory Basics

Motivation and Normal-form Game

Extensive-form Game and Imperfect Information

Bayesian Game and Incomplete Information

Nash Equilibrium and Variants

Theoretical Results of Nash Equilibrium

Repeated Game and Learning Methods

Alternate Solution Concepts and Evolutionary Game Theory

10、安波： Multi-agent Systems

History and Current Status

Key research areas in MAS

Recent advances

11、张崇洁： Deep Multi-agent Reinforcement Learning

Value-Based Methods

Policy Gradient Methods

12、杨耀东： Advances of Multi-agent Learning(in Gaming AI)

Multi-agent Learning for Games

Policy Evaluation in Meta-games

Policy Improvement in Meta-games

13、徐任远： Mean-field Games and Controls

General Mean-Field Games (GMFG)

GMFG with RL

Learning Mean-Field Controls

Q-learning Algorithm for MFC

14、全体导师： Panel Discussion

联系我们

1、汪军： Openning and Introduction

2、卢宗青： Introduction to Reinforcement Learning and Value-based Methods