RLChina 2024 TUTORIAL

Tutorial Speakers
课程介绍

周一上午:机器学习 Machine Learning Foundations

课程大纲:

  • 1、Machine Learning Problems and Formulations
  • 2、2.Linear Regression and Gradient Descent
  • 3、Deep Learning and Backpropagation
  • 4、Multi-class Classification with Softmax
  • 5、Clustering with Neural Networks
  • 6、Variational autoencoders
  • 7、7.Strategic prediction and multiagent learning
  • 8、8.Online learning, and non-regret algorithms
  • 9、9.Bayesian optimisation and reinforcement learning

讲师介绍:

汪军
伦敦大学学院(UCL)计算机系教授,阿兰·图灵研究所 Turing Fellow。主要研究智能信息系统,包括机器学习、强化学习、多智能体,数据挖掘、计算广告学、推荐系统等。已发表了 120 多篇学术论文,出版两本学术专著,多次获得最佳论文奖。

周一下午:基础大模型

课程大纲:

    Introduction of Foundation Model and Recent Advances in LLM (Linyi) LLM Pre-training and Post-training (Cheng Deng).
  • Chapter 1、Introduction to NLP
    • 1.1、What is Natural Language Processing (NLP)?
    • 1.2、NLP Tasks
    • 1.3、Transformers
  • Chapter 2、Recent Advances in AI: Towards AGI?
    • 2.1、The Progress to AGI
    • 2.2、Traning Scaling law
    • 2.3、Recent advances of ChatGPT O1
    • 2.4、Scaling LLM Test-Time Compute
  • Chapter 3、LLM Pre-training and Post-training
    • 3.1、Preparing the Data for Pre-training and Further Pre-training
    • 3.2、Training in General
    • 3.3、Pre-Training
    • 3.4、Post-Training

讲师介绍:

杨林易
Dr Linyi Yang is now working as a Research Assistant Professor at Westlake University. He has established an academic reputation in the fusion of Natural Language Processing (NLP) and Large Language Models, alongside explainable artificial intelligence, as evidenced by 40 publications, 13 co-leading publications in high-ranked venues with more than 2,780 citations, 9 CCF A Papers, 1 national award, 1 best paper nomination award. He served as an Area Chair at EMNLP-22 and CIKM-23, Senior Program Committee Member at IJCAI-23. Additionally, he has mentored 4 PhDs and over 10 graduate students as a subject adviser or co-supervisor.

汪文俊照片

邓程
邓程,现任华为伦敦研究院2012实验室研究员,博士毕业于上海交通大学吴文俊人工智能班,师从王新兵教授。在ICML、WSDM、CIKM、ECAI等会议以及EJIM、TMC、TNSE等期刊上发表多篇论文。他曾带领团队训练了K2、GeoGalactica等地球科学领域的大语言模型,并在TEDx上宣讲,其开发的GAKG项目已被联合国教科文组织开放科学论坛收录。目前,他专注于设备端大语言模型的研究,致力于服务垂直领域的创新与应用。

汪文俊照片

林润基
林润基,毕业于中科院自动化所,阿里巴巴高级算法工程师,核心参与Qwen系列模型开发。他的研究领域包括大语言模型alignment、多智能体系统和强化学习,已在 NeurIPS、ICLR等顶会和SCI期刊发表论文,论文引用次数超过 1400 次。

周二上午:大模型进阶技术

    Training LLMs in Practice (Cheng Deng) LLM Evaluation and Enhancement (Linyi) .
  • Chapter 4、LLM Deployment and Applications
    • 4.1、LLM for Cloud-based Serving
    • 4.2、LLM for the Edge
    • 4.3、Application of LLM in Different Walks of Life
  • Chapter 5、LLM Evaluation and Enhancement
    • 5.1、LLM Evaluation (single grading, pairwise evaluation, etc.)
    • 5.2、Emergent Ability (In-context Learning, COT, TOT, Generalization Ability)
    • 5.3、Challenges (Hallucination, Logical Consistency, Math Problem, Structured Prediction)

讲师介绍:

杨林易
Dr Linyi Yang is now working as a Research Assistant Professor at Westlake University. He has established an academic reputation in the fusion of Natural Language Processing (NLP) and Large Language Models, alongside explainable artificial intelligence, as evidenced by 40 publications, 13 co-leading publications in high-ranked venues with more than 2,780 citations, 9 CCF A Papers, 1 national award, 1 best paper nomination award. He served as an Area Chair at EMNLP-22 and CIKM-23, Senior Program Committee Member at IJCAI-23. Additionally, he has mentored 4 PhDs and over 10 graduate students as a subject adviser or co-supervisor.

汪文俊照片

邓程
邓程,现任华为伦敦研究院2012实验室研究员,博士毕业于上海交通大学吴文俊人工智能班,师从王新兵教授。在ICML、WSDM、CIKM、ECAI等会议以及EJIM、TMC、TNSE等期刊上发表多篇论文。他曾带领团队训练了K2、GeoGalactica等地球科学领域的大语言模型,并在TEDx上宣讲,其开发的GAKG项目已被联合国教科文组织开放科学论坛收录。目前,他专注于设备端大语言模型的研究,致力于服务垂直领域的创新与应用。

周二下午:大模型训练全栈实践

课程大纲:

    Hands-on LLM: A full-stack LLM practise training.
  • 1、Data Curation
  • 2、LLM Model Setup
  • 3、LLM Pre-Training
    • 3.1、LLM Pre-trainin
    • 3.2、LLM Further Pre-training
  • 4、LLM Post-Training
    • 4.1、LLM Supervised Fine-tuning
    • 4.2、LLM with RLHF
  • 5、LLM Deployment
    • 5.1、LLM for basic inference
    • 5.2、LLM as an Agent

讲师介绍:

邓程
邓程,现任华为伦敦研究院2012实验室研究员,博士毕业于上海交通大学吴文俊人工智能班,师从王新兵教授。在ICML、WSDM、CIKM、ECAI等会议以及EJIM、TMC、TNSE等期刊上发表多篇论文。他曾带领团队训练了K2、GeoGalactica等地球科学领域的大语言模型,并在TEDx上宣讲,其开发的GAKG项目已被联合国教科文组织开放科学论坛收录。目前,他专注于设备端大语言模型的研究,致力于服务垂直领域的创新与应用。

周三上午:强化学习

课程大纲:

    本课程提供强化学习的入门基础讲解,让学生能够较为全面地了解强化学习这门学科的各类问题和方法论,包括强化学习基础概念和理论、马尔科夫决策过程、动态规划、时序查分学习、值函数学习、模型无关控制方法、策略梯度、深度强化学习等方法。本课程以《动手学强化学习》教材为基础内容支持。
  • 1、强化学习技术概览
  • 2、马尔可夫决策过程
  • 3、动态规划
  • 4、值函数估计
  • 5、无模型控制方法
  • 6、参数化值函数
  • 7、策略梯度
  • 8、深度强化学习 – 价值方法
  • 9、深度强化学习 – 策略方法

讲师介绍:

张伟楠
张伟楠博士现任上海交通大学计算机系教授、博士生导师、副系主任,科研领域包括强化学习和数据科学,相关研究成果在CCF-A类国际会议和期刊上发表100余篇学术论文,谷歌学术引用2万余次,爱思唯尔中国高被引学者,获得5个最佳论文奖项,出版教材《动手学强化学习》和《动手学机器学习》。张伟楠长期担任NeurIPS、ICML、ICLR、KDD等会议的领域主席和TPAMI、FCS等期刊的编委,作为负责人承担国家自然科学基金优秀青年项目和科技部2030新一代人工智能重大项目课题,入选中国科协青年人才托举工程和上海市科委英才扬帆计划,获得吴文俊人工智能优秀青年奖和达摩院青橙奖。张伟楠于2011年获得上海交通大学计算机系ACM班学士学位,于2016年获得伦敦大学学院计算机系博士学位。

周三下午:强化学习实践

课程大纲:

    基于腾讯开悟平台,为学员提供一站式强化学习实践环境。以“迷宫寻宝”场景为例,开展常见强化学习算法项目实践。
  • 1、腾讯开悟强化学习实践平台介绍与演示
  • 2、QLearning算法实践
  • 3、Sarsa算法实践
  • 4、Dynamic Programming算法实践
  • 5、Monte Carlo算法实践
  • 6、深度强化学习算法实践(选学)

讲师介绍:

汪文俊照片

汪文俊
汪文俊,华中科技大学软件工程硕士,兼任电子科技大学计算机学院硕士生企业导师。长期从事游戏和教育产业相关工作,曾在华为、腾讯等企业任职,拥有丰富的工程和运营经验。主持了腾讯游戏浙江大学联合实验室、国内首个游戏化AI教育平台(腾讯开悟)和华为大数据软件人才培养方案等多个项目。

汪文俊照片

覃洪杨
腾讯AI平台部高级工程师,本科毕业于华南理工大学,开悟开放平台以及开悟AI大赛主要技术负责人,主要探索AI的工程化方向,包括AI平台搭建,强化训练框架流程,AI部署应用等。

周四上午:多智能体

课程大纲:

  • 1、Overview of Multi-Agent System
  • 2、Game Theory
  • 3、Multi-Agent Reinforcement Learning
张海峰照片

讲师介绍:

张海峰
张海峰,中国科学院自动化研究所副研究员,群体决策智能团队负责人,拥有北京大学计算机系的本科和博士学位,曾在伦敦大学学院从事博士后研究。他专注于多智能体和强化学习的研究,在多个国际顶级会议和期刊上发表了20余篇论文,并主持开发了“及第”智能体博弈平台,应用于油气、铁路等领域的智能调度。

张海峰照片

讲师介绍:

林舒
中国科学院自动化研究所助理研究员,主要研究领域包括组合优化问题求解、程序自动生成和算法优化、游戏AI、编程基础教育等。分别于2021年和2013年在北京大学获得计算机软件与理论博士学位和计算机科学与技术学士学位。

张海峰照片

讲师介绍:

钟方威
钟方威,博士毕业于北京大学,后在北京大学从事博士后研究工作,获国家博士后创新人才计划(当年计算机学科仅9人入选),国家自然科学青年基金等项目资助。研究工作以实现高效、自主、鲁棒、安全、善社交的通用具身智能体为目标,通过多领域交叉融合(包括机器学习、博弈论、机器人学、认知科学、计算机图形学、计算机视觉等),探索更高效的自主学习方法和推理机制,赋予智能体主动感知、推理、沟通、规划和决策能力,从而在复杂动态场景中完成系列具身交互任务。

周四下午:大模型多智能体

课程大纲:

    This course will present the challenges involved in solving data-science tasks automatically. It will notably cover the different LLM-Agent solutions recently developed and notably Caggle v1.0 enabling end-to-end DS-problem solving through structural reasoning.
  • 1、AGI agents lecture
  • 2、Memory Lecture
  • 3、DS Agents Lectures
    • 3.1、Automation of data-science problem solving
      • 3.1.1、Problem definition and challenges
      • 3.1.2、Problem definition and challenges
    • 3.2、Data-science agents
      • 3.2.1、Specialized LLM-agents
      • 3.2.2、General agents & Limitations
    • 3.3、Caggle v1.0 agent
      • 3.3.1、Structural reasoning optimization approach
      • 3.3.2、Problem definition and challenges
      • 3.3.3、An agent empowered with DS tools
      • 3.3.4、Automation, coverage, performance
  • 4、Pangu Agent demo
  • 5、DS agent demo

讲师介绍:

张海峰照片

Antoine Grosnit
Antoine Grosnit is a research scientist at Huawei Noah's Ark Lab in London and an external PhD student at TU Darmstadt under the supervision of Professor Jan Peters. He graduated from École Polytechnique and obtained a Master's degree in Mathematics, Vision, and Learning (MVA) from ENS Paris-Saclay in 2020. Antoine joined the Decision Making and Reasoning team at Huawei London, where he has been advancing the field of Bayesian optimization. He has notably contributed to the development of the HEBO solver (published in JAIR), which won the NeurIPS black-box optimization challenge. Additionally, he worked on a method to optimize antibodies (AntBO published in Cell Reports Methods) and played a key role in the design of BOiLS for logic synthesis improvement (DATE 2022). More recently, Antoine has expanded his focus to Large Language Models (LLMs) and AI agents, contributing to the theoretical understanding of emerging properties behind prompting techniques like Chain of Thought (CoT), and developing LLM-agent systems for robotics and data science.

张海峰照片

Zafeirios Fountas
Zafeirios Fountas is a senior research scientist at Huawei's Noah's Ark Lab, specializing in brain-inspired artificial intelligence. His work centers on leveraging aspects of human cognition to develop self-motivated, unsupervised reinforcement learning agents, better deep generative models (including LLMs), and long-term episodic memory. Previously, Fountas spent five years as a founding member and the lead AI scientist at Emotech Ltd, where he and his team earned four CES Innovation Awards for their innovative smart-home robotic products. His academic experience includes roles as a Principal Investigator and Visiting Lecturer at Imperial College London and Royal College of Arts, as well as an honorary research fellowship at the Wellcome Centre for Human Neuroimaging, at University College London.

张海峰照片

Filippos Christianos
Filippos Christianos is a research scientist at Noah's Ark Lab, Huawei in London, UK. Before that, he worked at the NVIDIA lab for Autonomous Vehicles. He completed his PhD at The University of Edinburgh where he worked on Multi-Agent Deep Reinforcement Learning (MARL). He also studied Electrical and Computer Engineering in Crete, Greece. Filippos is a co-author of the textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches" (MIT Press) and also maintains key environments and libraries for MARL research, including the Multi-Robot Warehouse and E-PyMARL.

张海峰照片

James Doran
James is a research engineer at Huawei's London Research Centre. He has a master's degree from the University of Edinburgh as well as 4 years of ML engineering experience. In these roles he has dealt with every step of the ML lifecycle from research and training to productionization and deployment. Now, James is most interested in the implementation and development of AI research, primarily focused on LLMs and reinforcement learning. A lot of his work has been centered around adapting LLMs to act as high-performing agents for solving tasks, achieving this through specialized prompting patterns or fine-tuning the models with supervised and reinforcement learning. Aside from this, he has also worked on neurosymbolic reasoning systems that guide search for geometry and algebra proofs. Finally, he has a publication at ICML 2024 about a method to improve zero-shot generalization in RL agents through automated environment design.

张海峰照片

Alex Maraval
Alex is a Senior ML Engineer at Huawei Noah’s Ark Lab in London. Alex graduated from EPFL (Lausanne, Switzerland) in Mathematics and completed a Master’s degree at Imperial College London in Machine Learning. He joined the Decision Making and Reasoning Team in Huawei in 2020 where he started working on Variatonal Inference, RL and mostly Gaussian Processes and Bayesian Optimization (BO). Alex contributed to a multitude of projects including research on High-Dimensional BO on structured spaces, BO on Graphs, and published a paper at NeurIPS 2023 on Meta-Learning for BO with Transformer Neural Processes. Alex has also been focusing recently on LLMs related projects. His research directions include building specialized Agents, extending RAG techniques, researching more performant optimizers and improving fine-tuning.

周五上午:具身智能:机器人学习视角

课程大纲:

    本课程将系统讲解具身智能领域中的机器人学习问题,探讨机器人如何在复杂环境中通过学习作出连续决策。内容涵盖机器学习、深度学习、视觉学习和强化学习等理论基础,帮助学生理解这些前沿方法如何用于机器人决策过程中的问题解决与适应性提升。
  • 1、What is robot learning
  • 2、Typical simulation environments
  • 3、Typical tasks (locomotion, manipulation, mobile manipulation)
  • 4、Robot learning algorithms (model-free RL, model-based RL, imitation learning)
  • 5、Foundation models for robot

讲师介绍:

温颖
温颖,上海交通大学人工智能学院及约翰·霍普克罗夫特计算机科学中心长聘教轨副教授,博士生导师,研究方向涉及多智能体学习,强化学习及决策大模型。他于2020年和2016年分别获得英国伦敦大学学院计算机系博士学位和研究型硕士学位。他入选了上海市上海市青年科技英才扬帆计划,上海海外高层次人才。他的四十余篇研究成果发表在ICML, NeurIPS, ICLR, IJCAI, AAMAS等相关领域的一流国际会议上,并且获得CoRL 2020最佳系统论文奖,AAMAS 2021 Blue Sky Track最佳论文奖。他连续多年担任ICML, NeurIPS, ICLR,IJCAI, AAAI等国际知名会议/期刊的PC成员或审稿人。

周五下午:具身智能实践

课程大纲:

    This course provides an introduction to robot motion planning and rigid-body dynamics. The first part covers classical motion planning methods, including grid-based search and sampling-based algorithms, along with modern techniques like trajectory optimization. The second part focuses on rigid-body dynamics, exploring forward and inverse dynamics, with a derivation of dynamics equations for a two-link robot.
  • 1、Traditional robotics
  • 2、Learning robotics
  • 3、ROS LLM demo
  • 4、shenzhen demo

大纲细节:

    Traditional robotics: Motion Planning and Dynamics.
  • Motion planning
    • - The planning problem
    • - Grid-based search algorithms
    • - Sampling-based motion planning
    • - Reactive and gradient-based planning
    • - Trajectory optimization
    • - Software for motion planning
  • * Dynamics
    • - Forward/inverse dynamics
    • - Approaches: Lagrange and Newton-Euler formulations
    • - Derivation of dynamics equations for a two-link robot
    • - Software for dynamics
  • Traditional robotics: kinematics
    • - rigid body motion
    • - robot kinematics and inverse kinematics

讲师介绍:

Christopher E. Mower
Chris is a Senior Research Scientist at Huawei Noah's Ark Lab in London, specializing in robotics and AI. He leads the ROS-LLM project and collaborates with Huawei teams based in Shenzhen, Hong Kong, Suzhou, and Hangzhou. Chris worked with LEJU robotics and the Huawei Cloud team to deliver a successful humanoid demonstration at Huawei Developers Conference 2024, Dongguan. Additionally, Chris co-leads collaborations with Marco Hutter at ETH Zurich, and Jan Peters at TU Darmstadt. Previously, Chris was a post-doc at King's College London, focusing on surgical robotics and collaborated on the EU Horizon 2020 FAROS Project. While at King's, he was awarded funding that supported two visiting positions with leading surgical robot labs in KU Leuven and Balgrist University Hospital (an internationally renowned hospital specializing in musculoskeletal disorders attached to ETH Zurich and University of Zurich). Chris holds a PhD in robotics from the University of Edinburgh supervised by Sethu Vijayakumar that was funded by The Costain Group, a UK-based civil engineering firm. During his PhD, Chris was part of a team that developed a shared autonomy system that won First Prize for Greatest Potential for Positive Impact at the Robots for Resilient Infrastructure International Challenge. He also holds a master's in computer science from Imperial College London and as part of his dissertation worked with the Hamlyn Center for robotic surgery. Additionally, Chris holds a second masters degree in applied mathematics from the University of Manchester. During his time at Manchester, Chris interned at the Numerical Algorithms Group and contributed code that was included into their library and is now used by major banks and insurance firms.

吴霜
Shuang Wu is currently a researcher in embodied AI at Huawei Noah's Ark Lab - Hong Kong, 2012 Laboratories. He received his B.Eng in Mechatronic Engineering from Zhejiang University in 2015, and his Ph.D. in Electronic and Computer Engineering from the Hong Kong University of Science and Technology in 2019. His research interests include control, optimization, and learning-based methods to enhance optimization and control.

万宇晖
Yuhui Wan is currently a research intern at Huawei Noah's Ark Lab in London and a PhD student in Mechanical Engineering at the University of Leeds, with the scope of research in human-robot teaming and embodied AI. He holds a Master's and a Bachelor's degree from Purdue University. Yuhui's accolades include 2nd place in the Multi-robot Inspection and Monitoring Challenge at the IEEE RAS summer school and 1st place in the Xplore New Automation Award, where he was recognized by the German Federal Minister for Economics and Energy. Yuhui collaborated with Huawei teams based in Shenzhen and Hangzhou. His contributions extend across various projects, including the ROS-LLM project in Hangzhou. He also coordinated with LEJU robotics and the Huawei Cloud team for a successful humanoid demonstration at the Huawei Developers Conference 2024 in Dongguan. Prior to his current roles, Yuhui worked at Fiat Chrysler Automobiles, where he contributed to the improvement of the Chrysler Pentastar V6 engine, a renowned and widely-used engine in the automotive industry.