RLChina 论文研讨会第108期(2024.01.14直播)
## 导读 论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。 第108期论文研讨会将由伦敦大学学院博士生宋研主持,天工Skywork研究科学家王超杰、吉林大学博士生郭思源,为大家分享与**LLM Reasoning**相关的最新研究工作,欢迎大家参与。 互动方式:**在本帖留言,可与报告嘉宾互动。** ## 简介 ### 主题 RLChina 论文研讨会第108期---**LLM Reasoning** ### 时间 北京时间 2025年01月14日 16:00-17:00 ### 直播渠道 B站RLChina直播间: http://live.bilibili.com/22386217 腾讯会议:530-2730-2590(请备注组织+真实姓名) ### 报告人 王超杰 天工Skywork,研究科学家 郭思源 吉林大学,博士生 ### 主持人 宋研 伦敦大学学院,博士生 --- ### 第一场 16:00-16:30 #### 报告人:王超杰 <img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2025/01/12/382481f8a0691d373c36fa6337124ee0.png" width = "150" alt="图片名称" align=center /> #### 报告人简介 Chaojie Wang is currently working as a Research Scientist in Skywork AI 2050, which focuses on accelerating the pace of realization of Artificial General Intelligence (AGI). Before that, Chaojie obtained Ph.D degree from Xidian University in 2021 and worked as a Research Fellow at Nanyang Technological University until 2023. Chaojie has published more than 30 papers in top AI conferences and journals, such as T-PAMI, NeurIPS, ICML, etc, leading a research team of nearly 10 members focusing on generative models, large language models (LLM), and reinforcement learning from human feedback (RLHF) technologies. In 2021, as the first contributor, Chaojie won the championship of the L2RPN-2021 international competition and is currently focused on the development and implementation of the Skywork’s supper app. #### 报告标题 Improve Multi-step Reasoning for LLMs with Deliberate Planning #### 报告摘要 Large Language Models (LLMs) trained on vast corpora of text data have demonstrated an impressive capability across various natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning, especially for solving mathematical problems and code generation. In this talk, we will focus on sharing our practical experiences in enhancing the multi-step reasoning capabilities of Skywork-o1 through cutting-edge alignment techniques, specifically Reinforcement Learning from Human Feedback (RLHF) and tree-search-based planning methods. Additionally, we will also discuss the opportunities and challenges that may be encountered in the future. #### 相关文章 Wang, Chaojie, et al. "Q*: Improving multi-step reasoning for llms with deliberative planning." arXiv preprint arXiv:2406.14283 (2024). --- ### 第二场 16:30-17:00 #### 报告人:郭思源 <img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2025/01/12/5c977e7c188f8f44ba10f5b37caef44b.jfif" width = "150" alt="图片名称" align=center /> #### 报告人简介 郭思源,吉林大学人工智能学院博士生,研究方向为大语言模型智能体和强化学习。 #### 报告标题 DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning #### 报告摘要 In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing $1.60 and $0.13 per run with GPT-4, respectively. #### 相关文章 Guo, Siyuan, et al. "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning." Forty-first International Conference on Machine Learning. ____ ## 联系我们 Email: rlchinacamp@163.com 
导读
论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。
第108期论文研讨会将由伦敦大学学院博士生宋研主持,天工Skywork研究科学家王超杰、吉林大学博士生郭思源,为大家分享与LLM Reasoning相关的最新研究工作,欢迎大家参与。
互动方式:在本帖留言,可与报告嘉宾互动。
简介
主题
RLChina 论文研讨会第108期—LLM Reasoning
时间
北京时间 2025年01月14日 16:00-17:00
直播渠道
B站RLChina直播间: http://live.bilibili.com/22386217
腾讯会议:530-2730-2590(请备注组织+真实姓名)
报告人
王超杰 天工Skywork,研究科学家
郭思源 吉林大学,博士生
主持人
宋研 伦敦大学学院,博士生
第一场 16:00-16:30
报告人:王超杰

报告人简介
Chaojie Wang is currently working as a Research Scientist in Skywork AI 2050, which focuses on accelerating the pace of realization of Artificial General Intelligence (AGI). Before that, Chaojie obtained Ph.D degree from Xidian University in 2021 and worked as a Research Fellow at Nanyang Technological University until 2023. Chaojie has published more than 30 papers in top AI conferences and journals, such as T-PAMI, NeurIPS, ICML, etc, leading a research team of nearly 10 members focusing on generative models, large language models (LLM), and reinforcement learning from human feedback (RLHF) technologies. In 2021, as the first contributor, Chaojie won the championship of the L2RPN-2021 international competition and is currently focused on the development and implementation of the Skywork’s supper app.
报告标题
Improve Multi-step Reasoning for LLMs with Deliberate Planning
报告摘要
Large Language Models (LLMs) trained on vast corpora of text data have demonstrated an impressive capability across various natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning, especially for solving mathematical problems and code generation. In this talk, we will focus on sharing our practical experiences in enhancing the multi-step reasoning capabilities of Skywork-o1 through cutting-edge alignment techniques, specifically Reinforcement Learning from Human Feedback (RLHF) and tree-search-based planning methods. Additionally, we will also discuss the opportunities and challenges that may be encountered in the future.
相关文章
Wang, Chaojie, et al. “Q*: Improving multi-step reasoning for llms with deliberative planning.” arXiv preprint arXiv:2406.14283 (2024).
第二场 16:30-17:00
报告人:郭思源
报告人简介
郭思源,吉林大学人工智能学院博士生,研究方向为大语言模型智能体和强化学习。
报告标题
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
报告摘要
In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing $1.60 and $0.13 per run with GPT-4, respectively.
相关文章
Guo, Siyuan, et al. “DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning.” Forty-first International Conference on Machine Learning.
联系我们
Email: rlchinacamp@163.com
导读
论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。
第108期论文研讨会将由伦敦大学学院博士生宋研主持,天工Skywork研究科学家王超杰、吉林大学博士生郭思源,为大家分享与LLM Reasoning相关的最新研究工作,欢迎大家参与。
互动方式:在本帖留言,可与报告嘉宾互动。
简介
主题
RLChina 论文研讨会第108期—LLM Reasoning
时间
北京时间 2025年01月14日 16:00-17:00
直播渠道
B站RLChina直播间: http://live.bilibili.com/22386217
腾讯会议:530-2730-2590(请备注组织+真实姓名)
报告人
王超杰 天工Skywork,研究科学家
郭思源 吉林大学,博士生
主持人
宋研 伦敦大学学院,博士生
第一场 16:00-16:30
报告人:王超杰

报告人简介
Chaojie Wang is currently working as a Research Scientist in Skywork AI 2050, which focuses on accelerating the pace of realization of Artificial General Intelligence (AGI). Before that, Chaojie obtained Ph.D degree from Xidian University in 2021 and worked as a Research Fellow at Nanyang Technological University until 2023. Chaojie has published more than 30 papers in top AI conferences and journals, such as T-PAMI, NeurIPS, ICML, etc, leading a research team of nearly 10 members focusing on generative models, large language models (LLM), and reinforcement learning from human feedback (RLHF) technologies. In 2021, as the first contributor, Chaojie won the championship of the L2RPN-2021 international competition and is currently focused on the development and implementation of the Skywork’s supper app.
报告标题
Improve Multi-step Reasoning for LLMs with Deliberate Planning
报告摘要
Large Language Models (LLMs) trained on vast corpora of text data have demonstrated an impressive capability across various natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning, especially for solving mathematical problems and code generation. In this talk, we will focus on sharing our practical experiences in enhancing the multi-step reasoning capabilities of Skywork-o1 through cutting-edge alignment techniques, specifically Reinforcement Learning from Human Feedback (RLHF) and tree-search-based planning methods. Additionally, we will also discuss the opportunities and challenges that may be encountered in the future.
相关文章
Wang, Chaojie, et al. “Q*: Improving multi-step reasoning for llms with deliberative planning.” arXiv preprint arXiv:2406.14283 (2024).
第二场 16:30-17:00
报告人:郭思源
报告人简介
郭思源,吉林大学人工智能学院博士生,研究方向为大语言模型智能体和强化学习。
报告标题
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
报告摘要
In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing $1.60 and $0.13 per run with GPT-4, respectively.
相关文章
Guo, Siyuan, et al. “DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning.” Forty-first International Conference on Machine Learning.
联系我们
Email: rlchinacamp@163.com