RLChina 论文研讨会第110期 (2025.3.18直播)
## 导读 论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。 第110期论文研讨会将由新加坡南洋理工大学博士生李晔文和薛正海为大家分享最新的研究工作,欢迎大家参与。 互动方式:**在本帖留言,可与报告嘉宾互动**。 ## 简介 ### 主题 RLChina 论文研讨会第110期 ### 时间 2025年3月18日 16:00-17:00 ### 直播渠道 B站:http://live.bilibili.com/22386217 ### 报告人 李晔文 新加坡南洋理工大学博士生 薛正海 新加坡南洋理工大学博士生 ----- ### 第一场 16:00-16:30 #### 报告人 李晔文 <img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2025/03/17/760e0bc2767e566e2f08684560b6a3f9.png" width = "150" alt="图片名称" align=center /> #### 报告人简介 李晔文,新加坡南洋理工大学博士生,研究方向为强化学习、生成模型、计算广告等,NeurIPS’24 生成式广告出价挑战赛冠军1/793。 #### 报告标题 GAS: Generative Auto-bidding with Post-training Search. #### 报告摘要 Computational advertising aims to effectively connect advertisers with Internet users, achieving significant commercial success and supporting the sustainability of information systems. This can be achieved in a real-time bidding (RTB) system, which should optimize advertisers’ bids within economic constraints while ensuring reliability through explainable and controllable algorithms. Traditional bidding strategies have evolved from rule-based methods to reinforcement learning approaches. However, the emergence of large foundation models, such as those based on transformers and diffusion models, offers new opportunities for RTB. These models possess exceptional generalization capabilities and have the potential to revolutionize bidding strategies by learning policies to generate bids capable of effectively handling the complexities of the advertising environment and adapting to dynamic economic conditions. Despite their promise, challenges remain in applying them to bidding, particularly concerning preference alignment, reliability, and sparse reward scenarios. This talk will investigate the potential of generative models for next-generation bidding strategies and introduce their current applications in real-world bidding systems. Specifically, the GAS (Generative Auto-bidding with post-training Search) method deployed in Kuaishou advertising platform has achieved significant improvements, e.g., 4.60 % increment of target cost. #### 发表信息 Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An. GAS: Generative Auto-bidding with Post-training Search. WWW-2025, Industry Track. ### 第二场 16:30-17:00 #### 报告人 薛正海 <img src=https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2025/03/17/10b2084900f907021c790948e8e1b8e1.png width = "150" alt="图片名称" align=center /> #### 报告人简介 薛正海,新加坡南洋理工大学博士生,研究方向包括强化学习和大语言模型。 #### 报告标题 AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems #### 报告摘要 The field of Reinforcement Learning (RL) has garnered increasing attention for its ability of optimizing user retention in recommender systems. A primary obstacle in this optimization process is the environment non-stationarity stemming from the continual and complex evolution of user behavior patterns over time, such as variations in interaction rates and retention propensities. These changes pose significant challenges to existing RL algorithms for recommendations, leading to issues with dynamics and reward distribution shifts. This paper introduces a novel approach called Adaptive User Retention Optimization (AURO) to address this challenge. To navigate the recommendation policy in non-stationary environments, AURO introduces an state abstraction module in the policy network. The module is trained with a new value-based loss function, aligning its output with the estimated performance of the current policy. As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly. Additionally, the non-stationarity of the environment introduces the problem of implicit cold start, where the recommendation policy continuously interacts with users displaying novel behavior patterns. AURO encourages exploration guarded by performance-based rejection sampling to maintain a stable recommendation quality in the cost-sensitive online environment. Extensive empirical analysis are conducted in a user retention simulator, the MovieLens dataset, and a live short-video recommendation platform, demonstrating AURO's superior performance against all evaluated baseline algorithms. #### 发表信息 AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems. Zhenghai Xue, Qingpeng Cai, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An. WWW 2025 (Oral)
导读
论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。
第110期论文研讨会将由新加坡南洋理工大学博士生李晔文和薛正海为大家分享最新的研究工作,欢迎大家参与。
互动方式:在本帖留言,可与报告嘉宾互动。
简介
主题
RLChina 论文研讨会第110期
时间
2025年3月18日 16:00-17:00
直播渠道
B站:http://live.bilibili.com/22386217
报告人
李晔文 新加坡南洋理工大学博士生
薛正海 新加坡南洋理工大学博士生
第一场 16:00-16:30
报告人 李晔文

报告人简介
李晔文,新加坡南洋理工大学博士生,研究方向为强化学习、生成模型、计算广告等,NeurIPS’24 生成式广告出价挑战赛冠军1/793。
报告标题
GAS: Generative Auto-bidding with Post-training Search.
报告摘要
Computational advertising aims to effectively connect advertisers with Internet users, achieving significant commercial success and supporting the sustainability of information systems. This can be achieved in a real-time bidding (RTB) system, which should optimize advertisers’ bids within economic constraints while ensuring reliability through explainable and controllable algorithms. Traditional bidding strategies have evolved from rule-based methods to reinforcement learning approaches. However, the emergence of large foundation models, such as those based on transformers and diffusion models, offers new opportunities for RTB. These models possess exceptional generalization capabilities and have the potential to revolutionize bidding strategies by learning policies to generate bids capable of effectively handling the complexities of the advertising environment and adapting to dynamic economic conditions. Despite their promise, challenges remain in applying them to bidding, particularly concerning preference alignment, reliability, and sparse reward scenarios.
This talk will investigate the potential of generative models for next-generation bidding strategies and introduce their current applications in real-world bidding systems. Specifically, the GAS (Generative Auto-bidding with post-training Search) method deployed in Kuaishou advertising platform has achieved significant improvements, e.g., 4.60 % increment of target cost.
发表信息
Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An. GAS: Generative Auto-bidding with Post-training Search. WWW-2025, Industry Track.
第二场 16:30-17:00
报告人 薛正海

报告人简介
薛正海,新加坡南洋理工大学博士生,研究方向包括强化学习和大语言模型。
报告标题
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
报告摘要
The field of Reinforcement Learning (RL) has garnered increasing attention for its ability of optimizing user retention in recommender systems. A primary obstacle in this optimization process is the environment non-stationarity stemming from the continual and complex evolution of user behavior patterns over time, such as variations in interaction rates and retention propensities. These changes pose significant challenges to existing RL algorithms for recommendations, leading to issues with dynamics and reward distribution shifts. This paper introduces a novel approach called Adaptive User Retention Optimization (AURO) to address this challenge. To navigate the recommendation policy in non-stationary environments, AURO introduces an state abstraction module in the policy network. The module is trained with a new value-based loss function, aligning its output with the estimated performance of the current policy. As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly. Additionally, the non-stationarity of the environment introduces the problem of implicit cold start, where the recommendation policy continuously interacts with users displaying novel behavior patterns. AURO encourages exploration guarded by performance-based rejection sampling to maintain a stable recommendation quality in the cost-sensitive online environment. Extensive empirical analysis are conducted in a user retention simulator, the MovieLens dataset, and a live short-video recommendation platform, demonstrating AURO’s superior performance against all evaluated baseline algorithms.
发表信息
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems. Zhenghai Xue, Qingpeng Cai, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An. WWW 2025 (Oral)
导读
论文研讨会是RLChina举办的学术活动,由RL领域不同研究团队轮流担任主持人,邀请一线研究人员针对特定论文做交流分享。
第110期论文研讨会将由新加坡南洋理工大学博士生李晔文和薛正海为大家分享最新的研究工作,欢迎大家参与。
互动方式:在本帖留言,可与报告嘉宾互动。
简介
主题
RLChina 论文研讨会第110期
时间
2025年3月18日 16:00-17:00
直播渠道
B站:http://live.bilibili.com/22386217
报告人
李晔文 新加坡南洋理工大学博士生
薛正海 新加坡南洋理工大学博士生
第一场 16:00-16:30
报告人 李晔文

报告人简介
李晔文,新加坡南洋理工大学博士生,研究方向为强化学习、生成模型、计算广告等,NeurIPS’24 生成式广告出价挑战赛冠军1/793。
报告标题
GAS: Generative Auto-bidding with Post-training Search.
报告摘要
Computational advertising aims to effectively connect advertisers with Internet users, achieving significant commercial success and supporting the sustainability of information systems. This can be achieved in a real-time bidding (RTB) system, which should optimize advertisers’ bids within economic constraints while ensuring reliability through explainable and controllable algorithms. Traditional bidding strategies have evolved from rule-based methods to reinforcement learning approaches. However, the emergence of large foundation models, such as those based on transformers and diffusion models, offers new opportunities for RTB. These models possess exceptional generalization capabilities and have the potential to revolutionize bidding strategies by learning policies to generate bids capable of effectively handling the complexities of the advertising environment and adapting to dynamic economic conditions. Despite their promise, challenges remain in applying them to bidding, particularly concerning preference alignment, reliability, and sparse reward scenarios.
This talk will investigate the potential of generative models for next-generation bidding strategies and introduce their current applications in real-world bidding systems. Specifically, the GAS (Generative Auto-bidding with post-training Search) method deployed in Kuaishou advertising platform has achieved significant improvements, e.g., 4.60 % increment of target cost.
发表信息
Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An. GAS: Generative Auto-bidding with Post-training Search. WWW-2025, Industry Track.
第二场 16:30-17:00
报告人 薛正海

报告人简介
薛正海,新加坡南洋理工大学博士生,研究方向包括强化学习和大语言模型。
报告标题
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
报告摘要
The field of Reinforcement Learning (RL) has garnered increasing attention for its ability of optimizing user retention in recommender systems. A primary obstacle in this optimization process is the environment non-stationarity stemming from the continual and complex evolution of user behavior patterns over time, such as variations in interaction rates and retention propensities. These changes pose significant challenges to existing RL algorithms for recommendations, leading to issues with dynamics and reward distribution shifts. This paper introduces a novel approach called Adaptive User Retention Optimization (AURO) to address this challenge. To navigate the recommendation policy in non-stationary environments, AURO introduces an state abstraction module in the policy network. The module is trained with a new value-based loss function, aligning its output with the estimated performance of the current policy. As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly. Additionally, the non-stationarity of the environment introduces the problem of implicit cold start, where the recommendation policy continuously interacts with users displaying novel behavior patterns. AURO encourages exploration guarded by performance-based rejection sampling to maintain a stable recommendation quality in the cost-sensitive online environment. Extensive empirical analysis are conducted in a user retention simulator, the MovieLens dataset, and a live short-video recommendation platform, demonstrating AURO’s superior performance against all evaluated baseline algorithms.
发表信息
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems. Zhenghai Xue, Qingpeng Cai, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An. WWW 2025 (Oral)