圆桌论道
“圆桌论道”是 RLChina的一个栏目,发布国内外前沿学术活动的预告、总结或解读等。
导读
国际人工智能联合会议(International Joint Conference on Artificial Intelligence, 简称为IJCAI)于1969年成立于加州,是人工智能领域最重要的学术会议之一。IJCAI 的目标是促进人工智能领域的学术交流和合作,推动人工智能的发展和应用。会议内容涵盖了人工智能的各个方面,包括机器学习、知识表示与推理、自然语言处理、计算机视觉等。今年,第32届 IJCAI 会议 将在 2023 年 8 月 19 日 ~ 8 月 25 日于中国,澳门举办。
据官方邮件通知,今年 IJCAI 共接收近 4566 份投稿,接收论文 644 篇,接收率约 14.1%。RLCN 从入选论文中筛选出强化学习相关论文 66 篇,并分类整理出 13 个类别,包括MARL, AI safety,Combinatorial Optimization等,供大家参考学习。点击此处,可获得 pdf 版分类列表。
论文整理

图 1 由 IJCAI 2023 66 篇强化学习相关论文标题生成的词云图
在入选的 644 篇论文中,共计 66 篇论文与强化学习 (Reinforcement Learning) 相关。与先前整理过的 AAAI 2022、ICML2021、NeurIPS 2021、ICLR 2023 等会议论文收录率低了许多。
与往年一样,MARL、Safe RL一直是研究的热点,但不同的是,在 IJCAI2023 中关注于 AI for science/AI for social good 的工作多了起来(具体参考图2)。令人惊讶的是,在今年少量录取的论文中,竟然有 4 (4/66)篇是 RL4Finance的工作 (快速了解通道),其比例与 Hierarchical RL、Combinatorial Optimization的研究热度相近,甚至超过了 Offline RL 和 Meta learning 的研究数量。此外,多达 3 (3/66)篇论文研究 Traffic Signal Control 问题,这在之前的会议录用结果中比较少见。(当然,录用结果与 IJCAI 的偏好关系也很大)
我们将论文分为13个类别,各类别及所属论文数如下:
图 2 IJCAI 2023 强化学习相关论文分类
图 3 Applications 类别细节
论文分类列表
每个类别的论文信息可以在下表中进行查阅。
AI safety/Robust/Safe RL/Constrained/Uncertainty |
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning |
Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich |
Adversarial Behavior Exclusion for Safe Reinforcement Learning |
Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani |
Robust Reinforcement Learning via Progressive Task Sequence |
Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu |
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing |
Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nusslein, Claudia Linnhoff-Popien, Thomy Phan |
Safe Reinforcement Learning via Probabilistic Logic Shields |
Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt |
Explanation-Guided Reward Alignment |
Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein |
StockFormer: Learning Hybrid Trading Machines with Predictive Coding |
Siyu Gao, Yunbo Wang, Xiaokang Yang |
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning |
Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong |
Controlling Neural Style Transfer with Deep Reinforcement Learning |
Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu |
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning |
Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian |
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control |
Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan |
Towards Generalizable Reinforcement Learning for Trade Execution |
Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao |
Contrastive Learning and Reward Smoothing for Deep Portfolio Management |
Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang |
Reinforcement Learning Approaches for Traffic Signal Control under Missing Data |
Hao Mei, Junxian Li, Bin Shi, Hua Wei |
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization |
Mengyuan Yang, Xiaolin Zheng, Qianqiao Liang, MengHan Wang, Mengying Zhu |
Transferable Curricula through Difficulty Conditioned Generators |
Sidney Tio, Pradeep Varakantham |
ALL-E: Aesthetics-guided Low-light Image Enhancement |
Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen |
Causal Deep Reinforcement Learning Using Observational Data |
Wenxuan Zhu, Chao Yu, Qiang Zhang |
Explainable Reinforcement Learning via a Causal World Model |
Zhongwei Yu, Jingqing Ruan, Dengpeng Xing |
Combinatorial Optimization |
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction |
Jan Tonshoff, Berke Kisin, Jakob Lindner, Martin Grohe |
Complex Contagion Influence Maximization: A Reinforcement Learning Approach |
Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe |
Automatic Truss Design with Reinforcement Learning |
Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu |
Optimal Decision Tree Policies for Markov Decision Processes |
Daniel Vos, Sicco Verwer |
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards |
Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko |
Scaling Goal-based Exploration via Pruning Proto-goals |
Akhil Bagaria, Tom Schaul |
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning |
Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen |
Ensemble Reinforcement Learning in Continuous Spaces – A Hierarchical Multi-Step Approach for Policy Training |
Gang Chen, Victoria Huang |
Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks |
Wonchul Shin, Yusung Kim |
A Hierarchical Approach to Population Training for Human-AI Collaboration |
Yi Loo, Chen Gong, Malika Meghjani |
Meta learning/Transfer/Multi-task/Generalization |
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting |
Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu |
Distributional Multi-Objective Decision Making |
Willem Ropke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowe, Diederik M. Roijers |
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism |
Xudong Guo, Daming Shi, Wenhui Fan |
Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks |
Carlos Martin, Tuomas Sandholm |
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning |
Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan |
Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization |
Lei Wu, Bin Guo, Qiuyun Zhang, Zhuo Sun, Jieyi Zhang, Zhiwen Yu |
Explainable Multi-Agent Reinforcement Learning for Temporal Queries |
Kayla Boggess, Sarit Kraus, Lu Feng |
Anticipatory Fictitious Play |
Alex Cloud, Albert Wang, Wesley Kerr |
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning |
Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan |
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning |
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li |
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning |
Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan |
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning |
Xiaoli Tang, Han Yu |
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents? |
Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich |
Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training |
Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran |
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning |
Elizaveta Tennant, Steve Hailes, Mirco Musolesi |
Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics |
Aamal Hussain, Francesco Belardinelli, Georgios Piliouras |
BRExIt: On Opponent Modelling in Expert Iteration |
Daniel Hernandez, Hendrik Baier, Michael Kaisers |
Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks |
Pei Xu, Junge Zhang, Kaiqi Huang |
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning |
Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li |
Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning |
Waldy Joe, Hoong Chuin Lau |
Generalization through Diversity: Improving Unsupervised Environment Design |
Wenjun Li, Pradeep Varakantham, Dexun Li |
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution |
Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li |
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning |
Zhe Zhang, Xiaoyang Tan |
More for Less: Safe Policy Improvement with Stronger Performance Guarantees |
Patrick Wienhoft, Marnix Suilen, Thiago D. Simao, Clemens Dubslaff, Christel Baier, Nils Jansen |
Enhancing Network by Reinforcement Learning and Neural Confined Local Search |
Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li |
On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling |
Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac |
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning |
Lang Qin, Rui Yan, Huajin Tang |
Neuro-Symbolic Class Expression Learning |
Caglar Demir, Axel-Cyrille Ngonga Ngomo |
ScriptWorld: Text Based Environment for Learning Procedural Knowledge |
Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi |
Hierarchical State Abstraction based on Structural Information Principles |
Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu |
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations |
Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu |
Action Space Reduction for Planning Domains |
Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi |
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations |
Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim |
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees |
Daqian Shao, Marta Kwiatkowska |
On the Reuse Bias in Off-Policy Reinforcement Learning |
Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu |
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms |
Pratik Gajane, Peter Auer, Ronald Ortner |
Adaptive Estimation Q-learning with Uncertainty and Familiarity |
Xiaoyu Gong, Shuai Lu, Jiayu Yu, Sheng Zhu, Zongze Li |
表 1 IJCAI 2023 强化学习相关 66 篇论文分类整理
作者: 米祈睿
编辑: 米祈睿,吴帅,张海峰
关于我们:
RLChina是由国内外强化学习学者联合发起的民间学术组织,主要活动包括举办强化学习线上公开课、强化学习线上研讨会等,旨在搭建强化学习学术界、产业界和广大爱好者之间的桥梁。
我们将在微信公众号发布论文解读、学术动态、名家观点等,欢迎大家关注我们!
<h1><a id="_0"></a>圆桌论道</h1>
<p>“圆桌论道”是 RLChina的一个栏目,发布国内外前沿学术活动的预告、总结或解读等。</p>
<h2><a id="_4"></a><strong>导读</strong></h2>
<p><a href="https://www.ijcai.org/" target="_blank">国际人工智能联合会议(International Joint Conference on Artificial Intelligence, 简称为IJCAI)</a>于1969年成立于加州,是人工智能领域最重要的学术会议之一。IJCAI 的目标是促进人工智能领域的学术交流和合作,推动人工智能的发展和应用。会议内容涵盖了人工智能的各个方面,包括机器学习、知识表示与推理、自然语言处理、计算机视觉等。今年,<a href="https://ijcai-23.org/" target="_blank">第32届 IJCAI 会议</a> 将在 2023 年 8 月 19 日 ~ 8 月 25 日于中国,澳门举办。</p>
<p>据官方邮件通知,今年 IJCAI 共接收近 4566 份投稿,接收论文 644 篇,接收率约 14.1%。RLCN 从入选论文中筛选出强化学习相关论文 <strong>66</strong> 篇,并分类整理出 <strong>13</strong> 个类别,包括MARL, AI safety,Combinatorial Optimization等,供大家参考学习。<a href="https://gitee.com/rlchina/paper-discussion-two/attach_files/1451760/download" target="_blank">点击此处</a>,可获得 pdf 版分类列表。</p>
<h2><a id="_10"></a><strong>论文整理</strong></h2>
<p><img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/6faee87a6c3c814eeb4ffcf7bed05b1f.png" alt=" " /></p>
<center>图 1 由 IJCAI 2023 66 篇强化学习相关论文标题生成的词云图</center>
</br>
<p>在入选的 644 篇论文中,共计 66 篇论文与强化学习 (Reinforcement Learning) 相关。与先前整理过的 <a href="http://rlchina.org/topic/322" target="_blank">AAAI 2022</a>、<a href="https://mp.weixin.qq.com/s/LwgAmXZ_NE-E9MVCSjOJew" target="_blank">ICML2021</a>、<a href="http://rlchina.org/topic/257" target="_blank">NeurIPS 2021</a>、<a href="http://rlchina.org/topic/650" target="_blank">ICLR 2023</a> 等会议论文收录率低了许多。</p>
<p>与往年一样,MARL、Safe RL一直是研究的热点,但不同的是,在 IJCAI2023 中关注于 <strong>AI for science</strong>/<strong>AI for social good</strong> 的工作多了起来(具体参考图2)。令人惊讶的是,在今年少量录取的论文中,竟然有 <strong>4</strong> (4/66)篇是 RL4Finance的工作 (<a href="http://rlchina.org/topic/743" target="_blank">快速了解通道</a>),其比例与 Hierarchical RL、Combinatorial Optimization的研究热度相近,甚至超过了 Offline RL 和 Meta learning 的研究数量。此外,多达 <strong>3</strong> (3/66)篇论文研究 <strong>Traffic Signal Control</strong> 问题,这在之前的会议录用结果中比较少见。(当然,录用结果与 IJCAI 的偏好关系也很大)</p>
<p>我们将论文分为13个类别,各类别及所属论文数如下:</p>
<center><img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/a9ef80b30d341c270436e7f35f356ec6.png" width="600" /></center>
<center>图 2 IJCAI 2023 强化学习相关论文分类</center>
<center><img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/03e5b46fca54107204671ee405557590.png" alt="图 3 Applications 类别细节" width="600" /></center>
<center>图 3 Applications 类别细节</center>
<h2><a id="_31"></a>论文分类列表</h2>
<p>每个类别的论文信息可以在下表中进行查阅。</p>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>AI safety/Robust/Safe RL/Constrained/Uncertainty</b></td></tr></table>
<table>
<thead>
<tr>
<th><em>A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich</em></td>
</tr>
<tr>
<td><em>Adversarial Behavior Exclusion for Safe Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani</em></td>
</tr>
<tr>
<td><em>Robust Reinforcement Learning via Progressive Task Sequence</em></td>
</tr>
<tr>
<td><em>Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu</em></td>
</tr>
<tr>
<td><em>CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing</em></td>
</tr>
<tr>
<td><em>Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nusslein, Claudia Linnhoff-Popien, Thomy Phan</em></td>
</tr>
<tr>
<td><em>Safe Reinforcement Learning via Probabilistic Logic Shields</em></td>
</tr>
<tr>
<td><em>Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt</em></td>
</tr>
<tr>
<td><em>Explanation-Guided Reward Alignment</em></td>
</tr>
<tr>
<td><em>Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>Applications</b></td></tr></table>
<table>
<thead>
<tr>
<th><em>StockFormer: Learning Hybrid Trading Machines with Predictive Coding</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Siyu Gao, Yunbo Wang, Xiaokang Yang</em></td>
</tr>
<tr>
<td><em>Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong</em></td>
</tr>
<tr>
<td><em>Controlling Neural Style Transfer with Deep Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu</em></td>
</tr>
<tr>
<td><em>InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian</em></td>
</tr>
<tr>
<td><em>GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control</em></td>
</tr>
<tr>
<td><em>Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan</em></td>
</tr>
<tr>
<td><em>Towards Generalizable Reinforcement Learning for Trade Execution</em></td>
</tr>
<tr>
<td><em>Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao</em></td>
</tr>
<tr>
<td><em>Contrastive Learning and Reward Smoothing for Deep Portfolio Management</em></td>
</tr>
<tr>
<td><em>Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang</em></td>
</tr>
<tr>
<td><em>Reinforcement Learning Approaches for Traffic Signal Control under Missing Data</em></td>
</tr>
<tr>
<td><em>Hao Mei, Junxian Li, Bin Shi, Hua Wei</em></td>
</tr>
<tr>
<td><em>Spotlight News Driven Quantitative Trading Based on Trajectory Optimization</em></td>
</tr>
<tr>
<td><em>Mengyuan Yang, Xiaolin Zheng, Qianqiao Liang, MengHan Wang, Mengying Zhu</em></td>
</tr>
<tr>
<td><em>Transferable Curricula through Difficulty Conditioned Generators</em></td>
</tr>
<tr>
<td><em>Sidney Tio, Pradeep Varakantham</em></td>
</tr>
<tr>
<td><em>ALL-E: Aesthetics-guided Low-light Image Enhancement</em></td>
</tr>
<tr>
<td><em>Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>Causal Inference </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Causal Deep Reinforcement Learning Using Observational Data</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Wenxuan Zhu, Chao Yu, Qiang Zhang</em></td>
</tr>
<tr>
<td><em>Explainable Reinforcement Learning via a Causal World Model</em></td>
</tr>
<tr>
<td><em>Zhongwei Yu, Jingqing Ruan, Dengpeng Xing</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Combinatorial Optimization </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Jan Tonshoff, Berke Kisin, Jakob Lindner, Martin Grohe</em></td>
</tr>
<tr>
<td><em>Complex Contagion Influence Maximization: A Reinforcement Learning Approach</em></td>
</tr>
<tr>
<td><em>Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe</em></td>
</tr>
<tr>
<td><em>Automatic Truss Design with Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu</em></td>
</tr>
<tr>
<td><em>Optimal Decision Tree Policies for Markov Decision Processes</em></td>
</tr>
<tr>
<td><em>Daniel Vos, Sicco Verwer</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Exploration </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko</em></td>
</tr>
<tr>
<td><em>Scaling Goal-based Exploration via Pruning Proto-goals</em></td>
</tr>
<tr>
<td><em>Akhil Bagaria, Tom Schaul</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Hierarchical RL </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen</em></td>
</tr>
<tr>
<td><em>Ensemble Reinforcement Learning in Continuous Spaces – A Hierarchical Multi-Step Approach for Policy Training</em></td>
</tr>
<tr>
<td><em>Gang Chen, Victoria Huang</em></td>
</tr>
<tr>
<td><em>Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks</em></td>
</tr>
<tr>
<td><em>Wonchul Shin, Yusung Kim</em></td>
</tr>
<tr>
<td><em>A Hierarchical Approach to Population Training for Human-AI Collaboration</em></td>
</tr>
<tr>
<td><em>Yi Loo, Chen Gong, Malika Meghjani</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Meta learning/Transfer/Multi-task/Generalization </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu</em></td>
</tr>
<tr>
<td><em>Distributional Multi-Objective Decision Making</em></td>
</tr>
<tr>
<td><em>Willem Ropke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowe, Diederik M. Roijers</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Multi-agent/Game Theory </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Xudong Guo, Daming Shi, Wenhui Fan</em></td>
</tr>
<tr>
<td><em>Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks</em></td>
</tr>
<tr>
<td><em>Carlos Martin, Tuomas Sandholm</em></td>
</tr>
<tr>
<td><em>Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan</em></td>
</tr>
<tr>
<td><em>Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization</em></td>
</tr>
<tr>
<td><em>Lei Wu, Bin Guo, Qiuyun Zhang, Zhuo Sun, Jieyi Zhang, Zhiwen Yu</em></td>
</tr>
<tr>
<td><em>Explainable Multi-Agent Reinforcement Learning for Temporal Queries</em></td>
</tr>
<tr>
<td><em>Kayla Boggess, Sarit Kraus, Lu Feng</em></td>
</tr>
<tr>
<td><em>Anticipatory Fictitious Play</em></td>
</tr>
<tr>
<td><em>Alex Cloud, Albert Wang, Wesley Kerr</em></td>
</tr>
<tr>
<td><em>Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan</em></td>
</tr>
<tr>
<td><em>DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li</em></td>
</tr>
<tr>
<td><em>Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan</em></td>
</tr>
<tr>
<td><em>Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning</em></td>
</tr>
<tr>
<td><em>Xiaoli Tang, Han Yu</em></td>
</tr>
<tr>
<td><em>Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?</em></td>
</tr>
<tr>
<td><em>Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich</em></td>
</tr>
<tr>
<td><em>Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training</em></td>
</tr>
<tr>
<td><em>Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran</em></td>
</tr>
<tr>
<td><em>Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Elizaveta Tennant, Steve Hailes, Mirco Musolesi</em></td>
</tr>
<tr>
<td><em>Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics</em></td>
</tr>
<tr>
<td><em>Aamal Hussain, Francesco Belardinelli, Georgios Piliouras</em></td>
</tr>
<tr>
<td><em>BRExIt: On Opponent Modelling in Expert Iteration</em></td>
</tr>
<tr>
<td><em>Daniel Hernandez, Hendrik Baier, Michael Kaisers</em></td>
</tr>
<tr>
<td><em>Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks</em></td>
</tr>
<tr>
<td><em>Pei Xu, Junge Zhang, Kaiqi Huang</em></td>
</tr>
<tr>
<td><em>MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li</em></td>
</tr>
<tr>
<td><em>Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Waldy Joe, Hoong Chuin Lau</em></td>
</tr>
<tr>
<td><em>Generalization through Diversity: Improving Unsupervised Environment Design</em></td>
</tr>
<tr>
<td><em>Wenjun Li, Pradeep Varakantham, Dexun Li</em></td>
</tr>
<tr>
<td><em>Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution</em></td>
</tr>
<tr>
<td><em>Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Offline RL</b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Zhe Zhang, Xiaoyang Tan</em></td>
</tr>
<tr>
<td><em>More for Less: Safe Policy Improvement with Stronger Performance Guarantees</em></td>
</tr>
<tr>
<td><em>Patrick Wienhoft, Marnix Suilen, Thiago D. Simao, Clemens Dubslaff, Christel Baier, Nils Jansen</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Others</b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Enhancing Network by Reinforcement Learning and Neural Confined Local Search</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li</em></td>
</tr>
<tr>
<td><em>On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling</em></td>
</tr>
<tr>
<td><em>Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac</em></td>
</tr>
<tr>
<td><em>A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Lang Qin, Rui Yan, Huajin Tang</em></td>
</tr>
<tr>
<td><em>Neuro-Symbolic Class Expression Learning</em></td>
</tr>
<tr>
<td><em>Caglar Demir, Axel-Cyrille Ngonga Ngomo</em></td>
</tr>
<tr>
<td><em>ScriptWorld: Text Based Environment for Learning Procedural Knowledge</em></td>
</tr>
<tr>
<td><em>Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Representation Learning </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Hierarchical State Abstraction based on Structural Information Principles</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu</em></td>
</tr>
<tr>
<td><em>An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations</em></td>
</tr>
<tr>
<td><em>Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu</em></td>
</tr>
<tr>
<td><em>Action Space Reduction for Planning Domains</em></td>
</tr>
<tr>
<td><em>Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Sample-efficient RL </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim</em></td>
</tr>
<tr>
<td><em>Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees</em></td>
</tr>
<tr>
<td><em>Daqian Shao, Marta Kwiatkowska</em></td>
</tr>
<tr>
<td><em>On the Reuse Bias in Off-Policy Reinforcement Learning</em></td>
</tr>
<tr>
<td><em>Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu</em></td>
</tr>
</tbody>
</table>
<table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Theory: optimality </b></td></tr></table>
<table>
<thead>
<tr>
<th><em>Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms</em></th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Pratik Gajane, Peter Auer, Ronald Ortner</em></td>
</tr>
<tr>
<td><em>Adaptive Estimation Q-learning with Uncertainty and Familiarity</em></td>
</tr>
<tr>
<td><em>Xiaoyu Gong, Shuai Lu, Jiayu Yu, Sheng Zhu, Zongze Li</em></td>
</tr>
</tbody>
</table>
<center>表 1 IJCAI 2023 强化学习相关 66 篇论文分类整理</center>
<hr />
<p><strong>作者:</strong> 米祈睿</p>
<p><strong>编辑:</strong> 米祈睿,吴帅,张海峰</p>
<p><strong>关于我们:</strong></p>
<p>RLChina是由国内外强化学习学者联合发起的民间学术组织,主要活动包括举办强化学习线上公开课、强化学习线上研讨会等,旨在搭建强化学习学术界、产业界和广大爱好者之间的桥梁。</p>
<p>我们将在微信公众号发布论文解读、学术动态、名家观点等,欢迎大家关注我们!</p>