中科院自动化研究所2021级直博生~
发布于

圆桌论道 | IJCAI 2023 强化学习相关 66 篇论文分类整理


# 圆桌论道 “圆桌论道”是 RLChina的一个栏目,发布国内外前沿学术活动的预告、总结或解读等。 ## **导读** [国际人工智能联合会议(International Joint Conference on Artificial Intelligence, 简称为IJCAI)](https://www.ijcai.org/)于1969年成立于加州,是人工智能领域最重要的学术会议之一。IJCAI 的目标是促进人工智能领域的学术交流和合作,推动人工智能的发展和应用。会议内容涵盖了人工智能的各个方面,包括机器学习、知识表示与推理、自然语言处理、计算机视觉等。今年,[第32届 IJCAI 会议](https://ijcai-23.org/) 将在 2023 年 8 月 19 日 ~ 8 月 25 日于中国,澳门举办。 据官方邮件通知,今年 IJCAI 共接收近 4566 份投稿,接收论文 644 篇,接收率约 14.1%。RLCN 从入选论文中筛选出强化学习相关论文 **66** 篇,并分类整理出 **13** 个类别,包括MARL, AI safety,Combinatorial Optimization等,供大家参考学习。[点击此处](https://gitee.com/rlchina/paper-discussion-two/attach_files/1451760/download),可获得 pdf 版分类列表。 ## **论文整理** ![ ](https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/6faee87a6c3c814eeb4ffcf7bed05b1f.png) <center>图 1 由 IJCAI 2023 66 篇强化学习相关论文标题生成的词云图</center> </br> 在入选的 644 篇论文中,共计 66 篇论文与强化学习 (Reinforcement Learning) 相关。与先前整理过的 [AAAI 2022](http://rlchina.org/topic/322)、[ICML2021](https://mp.weixin.qq.com/s/LwgAmXZ_NE-E9MVCSjOJew)、[NeurIPS 2021](http://rlchina.org/topic/257)、[ICLR 2023](http://rlchina.org/topic/650) 等会议论文收录率低了许多。 与往年一样,MARL、Safe RL一直是研究的热点,但不同的是,在 IJCAI2023 中关注于 **AI for science**/**AI for social good** 的工作多了起来(具体参考图2)。令人惊讶的是,在今年少量录取的论文中,竟然有 **4** (4/66)篇是 RL4Finance的工作 ([快速了解通道](http://rlchina.org/topic/743)),其比例与 Hierarchical RL、Combinatorial Optimization的研究热度相近,甚至超过了 Offline RL 和 Meta learning 的研究数量。此外,多达 **3** (3/66)篇论文研究 **Traffic Signal Control** 问题,这在之前的会议录用结果中比较少见。(当然,录用结果与 IJCAI 的偏好关系也很大) 我们将论文分为13个类别,各类别及所属论文数如下: <center><img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/a9ef80b30d341c270436e7f35f356ec6.png" width="600" /></center> <center>图 2 IJCAI 2023 强化学习相关论文分类</center> <center><img src="https://rlchian-bbs.oss-cn-beijing.aliyuncs.com/images/2023/06/28/03e5b46fca54107204671ee405557590.png" alt="图 3 Applications 类别细节" width="600" /></center> <center>图 3 Applications 类别细节</center> ## 论文分类列表 每个类别的论文信息可以在下表中进行查阅。 <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>AI safety/Robust/Safe RL/Constrained/Uncertainty</b></td></tr></table> | *A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning* | | ------------------------------------------------------------ | | *Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich* | | *Adversarial Behavior Exclusion for Safe Reinforcement Learning* | | *Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani* | | *Robust Reinforcement Learning via Progressive Task Sequence* | | *Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu* | | *CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing* | | *Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nusslein, Claudia Linnhoff-Popien, Thomy Phan* | | *Safe Reinforcement Learning via Probabilistic Logic Shields* | | *Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt* | | *Explanation-Guided Reward Alignment* | | *Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>Applications</b></td></tr></table> | *StockFormer: Learning Hybrid Trading Machines with Predictive Coding* | | ------------------------------------------------------------ | | *Siyu Gao, Yunbo Wang, Xiaokang Yang* | | *Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning* | | *Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong* | | *Controlling Neural Style Transfer with Deep Reinforcement Learning* | | *Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu* | | *InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning* | | *Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian* | | *GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control* | | *Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan* | | *Towards Generalizable Reinforcement Learning for Trade Execution* | | *Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao* | | *Contrastive Learning and Reward Smoothing for Deep Portfolio Management* | | *Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang* | | *Reinforcement Learning Approaches for Traffic Signal Control under Missing Data* | | *Hao Mei, Junxian Li, Bin Shi, Hua Wei* | | *Spotlight News Driven Quantitative Trading Based on Trajectory Optimization* | | *Mengyuan Yang, Xiaolin Zheng, Qianqiao Liang, MengHan Wang, Mengying Zhu* | | *Transferable Curricula through Difficulty Conditioned Generators* | | *Sidney Tio, Pradeep Varakantham* | | *ALL-E: Aesthetics-guided Low-light Image Enhancement* | | *Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b>Causal Inference </b></td></tr></table> | *Causal Deep Reinforcement Learning Using Observational Data* | | ------------------------------------------------------------ | | *Wenxuan Zhu, Chao Yu, Qiang Zhang* | | *Explainable Reinforcement Learning via a Causal World Model* | | *Zhongwei Yu, Jingqing Ruan, Dengpeng Xing* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Combinatorial Optimization </b></td></tr></table> | *One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction* | | ------------------------------------------------------------ | | *Jan Tonshoff, Berke Kisin, Jakob Lindner, Martin Grohe* | | *Complex Contagion Influence Maximization: A Reinforcement Learning Approach* | | *Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe* | | *Automatic Truss Design with Reinforcement Learning* | | *Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu* | | *Optimal Decision Tree Policies for Markov Decision Processes* | | *Daniel Vos, Sicco Verwer* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Exploration </b></td></tr></table> | *DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards* | | ------------------------------------------------------------ | | *Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko* | | *Scaling Goal-based Exploration via Pruning Proto-goals* | | *Akhil Bagaria, Tom Schaul* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Hierarchical RL </b></td></tr></table> | *Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning* | | ------------------------------------------------------------ | | *Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen* | | *Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical Multi-Step Approach for Policy Training* | | *Gang Chen, Victoria Huang* | | *Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks* | | *Wonchul Shin, Yusung Kim* | | *A Hierarchical Approach to Population Training for Human-AI Collaboration* | | *Yi Loo, Chen Gong, Malika Meghjani* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Meta learning/Transfer/Multi-task/Generalization </b></td></tr></table> | *Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting* | | ------------------------------------------------------------ | | *Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu* | | *Distributional Multi-Objective Decision Making* | | *Willem Ropke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowe, Diederik M. Roijers* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Multi-agent/Game Theory </b></td></tr></table> | *Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism* | | ------------------------------------------------------------ | | *Xudong Guo, Daming Shi, Wenhui Fan* | | *Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks* | | *Carlos Martin, Tuomas Sandholm* | | *Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning* | | *Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan* | | *Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization* | | *Lei Wu, Bin Guo, Qiuyun Zhang, Zhuo Sun, Jieyi Zhang, Zhiwen Yu* | | *Explainable Multi-Agent Reinforcement Learning for Temporal Queries* | | *Kayla Boggess, Sarit Kraus, Lu Feng* | | *Anticipatory Fictitious Play* | | *Alex Cloud, Albert Wang, Wesley Kerr* | | *Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning* | | *Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan* | | *DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning* | | *Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li* | | *Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning* | | *Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan* | | *Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning* | | *Xiaoli Tang, Han Yu* | | *Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?* | | *Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich* | | *Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training* | | *Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran* | | *Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning* | | *Elizaveta Tennant, Steve Hailes, Mirco Musolesi* | | *Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics* | | *Aamal Hussain, Francesco Belardinelli, Georgios Piliouras* | | *BRExIt: On Opponent Modelling in Expert Iteration* | | *Daniel Hernandez, Hendrik Baier, Michael Kaisers* | | *Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks* | | *Pei Xu, Junge Zhang, Kaiqi Huang* | | *MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning* | | *Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li* | | *Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning* | | *Waldy Joe, Hoong Chuin Lau* | | *Generalization through Diversity: Improving Unsupervised Environment Design* | | *Wenjun Li, Pradeep Varakantham, Dexun Li* | | *Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution* | | *Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Offline RL</b></td></tr></table> | *Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning* | | ------------------------------------------------------------ | | *Zhe Zhang, Xiaoyang Tan* | | *More for Less: Safe Policy Improvement with Stronger Performance Guarantees* | | *Patrick Wienhoft, Marnix Suilen, Thiago D. Simao, Clemens Dubslaff, Christel Baier, Nils Jansen* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Others</b></td></tr></table> | *Enhancing Network by Reinforcement Learning and Neural Confined Local Search* | | ------------------------------------------------------------ | | *Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li* | | *On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling* | | *Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac* | | *A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning* | | *Lang Qin, Rui Yan, Huajin Tang* | | *Neuro-Symbolic Class Expression Learning* | | *Caglar Demir, Axel-Cyrille Ngonga Ngomo* | | *ScriptWorld: Text Based Environment for Learning Procedural Knowledge* | | *Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Representation Learning </b></td></tr></table> | *Hierarchical State Abstraction based on Structural Information Principles* | | ------------------------------------------------------------ | | *Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu* | | *An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations* | | *Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu* | | *Action Space Reduction for Planning Domains* | | *Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Sample-efficient RL </b></td></tr></table> | *SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations* | | ------------------------------------------------------------ | | *Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim* | | *Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees* | | *Daqian Shao, Marta Kwiatkowska* | | *On the Reuse Bias in Off-Policy Reinforcement Learning* | | *Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu* | <table><tr style="background:PowderBlue;"><td bgcolor=PowderBlue width="1000"><b> Theory: optimality </b></td></tr></table> | *Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms* | | ------------------------------------------------------------ | | *Pratik Gajane, Peter Auer, Ronald Ortner* | | *Adaptive Estimation Q-learning with Uncertainty and Familiarity* | | *Xiaoyu Gong, Shuai Lu, Jiayu Yu, Sheng Zhu, Zongze Li* | <center>表 1 IJCAI 2023 强化学习相关 66 篇论文分类整理</center> ------------------------------------ **作者:** 米祈睿 **编辑:** 米祈睿,吴帅,张海峰 **关于我们:** RLChina是由国内外强化学习学者联合发起的民间学术组织,主要活动包括举办强化学习线上公开课、强化学习线上研讨会等,旨在搭建强化学习学术界、产业界和广大爱好者之间的桥梁。 我们将在微信公众号发布论文解读、学术动态、名家观点等,欢迎大家关注我们!

圆桌论道

“圆桌论道”是 RLChina的一个栏目,发布国内外前沿学术活动的预告、总结或解读等。

导读

国际人工智能联合会议(International Joint Conference on Artificial Intelligence, 简称为IJCAI)于1969年成立于加州,是人工智能领域最重要的学术会议之一。IJCAI 的目标是促进人工智能领域的学术交流和合作,推动人工智能的发展和应用。会议内容涵盖了人工智能的各个方面,包括机器学习、知识表示与推理、自然语言处理、计算机视觉等。今年,第32届 IJCAI 会议 将在 2023 年 8 月 19 日 ~ 8 月 25 日于中国,澳门举办。

据官方邮件通知,今年 IJCAI 共接收近 4566 份投稿,接收论文 644 篇,接收率约 14.1%。RLCN 从入选论文中筛选出强化学习相关论文 66 篇,并分类整理出 13 个类别,包括MARL, AI safety,Combinatorial Optimization等,供大家参考学习。点击此处,可获得 pdf 版分类列表。

论文整理

图 1 由 IJCAI 2023 66 篇强化学习相关论文标题生成的词云图

在入选的 644 篇论文中,共计 66 篇论文与强化学习 (Reinforcement Learning) 相关。与先前整理过的 AAAI 2022ICML2021NeurIPS 2021ICLR 2023 等会议论文收录率低了许多。

与往年一样,MARL、Safe RL一直是研究的热点,但不同的是,在 IJCAI2023 中关注于 AI for science/AI for social good 的工作多了起来(具体参考图2)。令人惊讶的是,在今年少量录取的论文中,竟然有 4 (4/66)篇是 RL4Finance的工作 (快速了解通道),其比例与 Hierarchical RL、Combinatorial Optimization的研究热度相近,甚至超过了 Offline RL 和 Meta learning 的研究数量。此外,多达 3 (3/66)篇论文研究 Traffic Signal Control 问题,这在之前的会议录用结果中比较少见。(当然,录用结果与 IJCAI 的偏好关系也很大)

我们将论文分为13个类别,各类别及所属论文数如下:

图 2 IJCAI 2023 强化学习相关论文分类
图 3 Applications 类别细节
图 3 Applications 类别细节

论文分类列表

每个类别的论文信息可以在下表中进行查阅。

AI safety/Robust/Safe RL/Constrained/Uncertainty
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich
Adversarial Behavior Exclusion for Safe Reinforcement Learning
Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani
Robust Reinforcement Learning via Progressive Task Sequence
Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing
Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nusslein, Claudia Linnhoff-Popien, Thomy Phan
Safe Reinforcement Learning via Probabilistic Logic Shields
Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt
Explanation-Guided Reward Alignment
Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein
Applications
StockFormer: Learning Hybrid Trading Machines with Predictive Coding
Siyu Gao, Yunbo Wang, Xiaokang Yang
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong
Controlling Neural Style Transfer with Deep Reinforcement Learning
Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control
Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan
Towards Generalizable Reinforcement Learning for Trade Execution
Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao
Contrastive Learning and Reward Smoothing for Deep Portfolio Management
Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang
Reinforcement Learning Approaches for Traffic Signal Control under Missing Data
Hao Mei, Junxian Li, Bin Shi, Hua Wei
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization
Mengyuan Yang, Xiaolin Zheng, Qianqiao Liang, MengHan Wang, Mengying Zhu
Transferable Curricula through Difficulty Conditioned Generators
Sidney Tio, Pradeep Varakantham
ALL-E: Aesthetics-guided Low-light Image Enhancement
Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen
Causal Inference
Causal Deep Reinforcement Learning Using Observational Data
Wenxuan Zhu, Chao Yu, Qiang Zhang
Explainable Reinforcement Learning via a Causal World Model
Zhongwei Yu, Jingqing Ruan, Dengpeng Xing
Combinatorial Optimization
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
Jan Tonshoff, Berke Kisin, Jakob Lindner, Martin Grohe
Complex Contagion Influence Maximization: A Reinforcement Learning Approach
Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe
Automatic Truss Design with Reinforcement Learning
Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu
Optimal Decision Tree Policies for Markov Decision Processes
Daniel Vos, Sicco Verwer
Exploration
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko
Scaling Goal-based Exploration via Pruning Proto-goals
Akhil Bagaria, Tom Schaul
Hierarchical RL
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen
Ensemble Reinforcement Learning in Continuous Spaces – A Hierarchical Multi-Step Approach for Policy Training
Gang Chen, Victoria Huang
Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks
Wonchul Shin, Yusung Kim
A Hierarchical Approach to Population Training for Human-AI Collaboration
Yi Loo, Chen Gong, Malika Meghjani
Meta learning/Transfer/Multi-task/Generalization
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting
Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu
Distributional Multi-Objective Decision Making
Willem Ropke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowe, Diederik M. Roijers
Multi-agent/Game Theory
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism
Xudong Guo, Daming Shi, Wenhui Fan
Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks
Carlos Martin, Tuomas Sandholm
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning
Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan
Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization
Lei Wu, Bin Guo, Qiuyun Zhang, Zhuo Sun, Jieyi Zhang, Zhiwen Yu
Explainable Multi-Agent Reinforcement Learning for Temporal Queries
Kayla Boggess, Sarit Kraus, Lu Feng
Anticipatory Fictitious Play
Alex Cloud, Albert Wang, Wesley Kerr
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning
Xiaoli Tang, Han Yu
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich
Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training
Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Elizaveta Tennant, Steve Hailes, Mirco Musolesi
Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics
Aamal Hussain, Francesco Belardinelli, Georgios Piliouras
BRExIt: On Opponent Modelling in Expert Iteration
Daniel Hernandez, Hendrik Baier, Michael Kaisers
Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
Pei Xu, Junge Zhang, Kaiqi Huang
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning
Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li
Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning
Waldy Joe, Hoong Chuin Lau
Generalization through Diversity: Improving Unsupervised Environment Design
Wenjun Li, Pradeep Varakantham, Dexun Li
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li
Offline RL
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning
Zhe Zhang, Xiaoyang Tan
More for Less: Safe Policy Improvement with Stronger Performance Guarantees
Patrick Wienhoft, Marnix Suilen, Thiago D. Simao, Clemens Dubslaff, Christel Baier, Nils Jansen
Others
Enhancing Network by Reinforcement Learning and Neural Confined Local Search
Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li
On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling
Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning
Lang Qin, Rui Yan, Huajin Tang
Neuro-Symbolic Class Expression Learning
Caglar Demir, Axel-Cyrille Ngonga Ngomo
ScriptWorld: Text Based Environment for Learning Procedural Knowledge
Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi
Representation Learning
Hierarchical State Abstraction based on Structural Information Principles
Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu
Action Space Reduction for Planning Domains
Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi
Sample-efficient RL
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations
Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Daqian Shao, Marta Kwiatkowska
On the Reuse Bias in Off-Policy Reinforcement Learning
Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu
Theory: optimality
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
Pratik Gajane, Peter Auer, Ronald Ortner
Adaptive Estimation Q-learning with Uncertainty and Familiarity
Xiaoyu Gong, Shuai Lu, Jiayu Yu, Sheng Zhu, Zongze Li
表 1 IJCAI 2023 强化学习相关 66 篇论文分类整理

作者: 米祈睿

编辑: 米祈睿,吴帅,张海峰

关于我们:

RLChina是由国内外强化学习学者联合发起的民间学术组织,主要活动包括举办强化学习线上公开课、强化学习线上研讨会等,旨在搭建强化学习学术界、产业界和广大爱好者之间的桥梁。

我们将在微信公众号发布论文解读、学术动态、名家观点等,欢迎大家关注我们!

圆桌论道

“圆桌论道”是 RLChina的一个栏目,发布国内外前沿学术活动的预告、总结或解读等。

导读

国际人工智能联合会议(International Joint Conference on Artificial Intelligence, 简称为IJCAI)于1969年成立于加州,是人工智能领域最重要的学术会议之一。IJCAI 的目标是促进人工智能领域的学术交流和合作,推动人工智能的发展和应用。会议内容涵盖了人工智能的各个方面,包括机器学习、知识表示与推理、自然语言处理、计算机视觉等。今年,第32届 IJCAI 会议 将在 2023 年 8 月 19 日 ~ 8 月 25 日于中国,澳门举办。

据官方邮件通知,今年 IJCAI 共接收近 4566 份投稿,接收论文 644 篇,接收率约 14.1%。RLCN 从入选论文中筛选出强化学习相关论文 66 篇,并分类整理出 13 个类别,包括MARL, AI safety,Combinatorial Optimization等,供大家参考学习。点击此处,可获得 pdf 版分类列表。

论文整理

图 1 由 IJCAI 2023 66 篇强化学习相关论文标题生成的词云图

在入选的 644 篇论文中,共计 66 篇论文与强化学习 (Reinforcement Learning) 相关。与先前整理过的 AAAI 2022ICML2021NeurIPS 2021ICLR 2023 等会议论文收录率低了许多。

与往年一样,MARL、Safe RL一直是研究的热点,但不同的是,在 IJCAI2023 中关注于 AI for science/AI for social good 的工作多了起来(具体参考图2)。令人惊讶的是,在今年少量录取的论文中,竟然有 4 (4/66)篇是 RL4Finance的工作 (快速了解通道),其比例与 Hierarchical RL、Combinatorial Optimization的研究热度相近,甚至超过了 Offline RL 和 Meta learning 的研究数量。此外,多达 3 (3/66)篇论文研究 Traffic Signal Control 问题,这在之前的会议录用结果中比较少见。(当然,录用结果与 IJCAI 的偏好关系也很大)

我们将论文分为13个类别,各类别及所属论文数如下:

图 2 IJCAI 2023 强化学习相关论文分类
图 3 Applications 类别细节
图 3 Applications 类别细节

论文分类列表

每个类别的论文信息可以在下表中进行查阅。

AI safety/Robust/Safe RL/Constrained/Uncertainty
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich
Adversarial Behavior Exclusion for Safe Reinforcement Learning
Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani
Robust Reinforcement Learning via Progressive Task Sequence
Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing
Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nusslein, Claudia Linnhoff-Popien, Thomy Phan
Safe Reinforcement Learning via Probabilistic Logic Shields
Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt
Explanation-Guided Reward Alignment
Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein
Applications
StockFormer: Learning Hybrid Trading Machines with Predictive Coding
Siyu Gao, Yunbo Wang, Xiaokang Yang
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong
Controlling Neural Style Transfer with Deep Reinforcement Learning
Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control
Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan
Towards Generalizable Reinforcement Learning for Trade Execution
Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao
Contrastive Learning and Reward Smoothing for Deep Portfolio Management
Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang
Reinforcement Learning Approaches for Traffic Signal Control under Missing Data
Hao Mei, Junxian Li, Bin Shi, Hua Wei
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization
Mengyuan Yang, Xiaolin Zheng, Qianqiao Liang, MengHan Wang, Mengying Zhu
Transferable Curricula through Difficulty Conditioned Generators
Sidney Tio, Pradeep Varakantham
ALL-E: Aesthetics-guided Low-light Image Enhancement
Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen
Causal Inference
Causal Deep Reinforcement Learning Using Observational Data
Wenxuan Zhu, Chao Yu, Qiang Zhang
Explainable Reinforcement Learning via a Causal World Model
Zhongwei Yu, Jingqing Ruan, Dengpeng Xing
Combinatorial Optimization
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
Jan Tonshoff, Berke Kisin, Jakob Lindner, Martin Grohe
Complex Contagion Influence Maximization: A Reinforcement Learning Approach
Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe
Automatic Truss Design with Reinforcement Learning
Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu
Optimal Decision Tree Policies for Markov Decision Processes
Daniel Vos, Sicco Verwer
Exploration
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko
Scaling Goal-based Exploration via Pruning Proto-goals
Akhil Bagaria, Tom Schaul
Hierarchical RL
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen
Ensemble Reinforcement Learning in Continuous Spaces – A Hierarchical Multi-Step Approach for Policy Training
Gang Chen, Victoria Huang
Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks
Wonchul Shin, Yusung Kim
A Hierarchical Approach to Population Training for Human-AI Collaboration
Yi Loo, Chen Gong, Malika Meghjani
Meta learning/Transfer/Multi-task/Generalization
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting
Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu
Distributional Multi-Objective Decision Making
Willem Ropke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowe, Diederik M. Roijers
Multi-agent/Game Theory
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism
Xudong Guo, Daming Shi, Wenhui Fan
Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks
Carlos Martin, Tuomas Sandholm
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning
Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan
Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization
Lei Wu, Bin Guo, Qiuyun Zhang, Zhuo Sun, Jieyi Zhang, Zhiwen Yu
Explainable Multi-Agent Reinforcement Learning for Temporal Queries
Kayla Boggess, Sarit Kraus, Lu Feng
Anticipatory Fictitious Play
Alex Cloud, Albert Wang, Wesley Kerr
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning
Xiaoli Tang, Han Yu
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich
Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training
Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Elizaveta Tennant, Steve Hailes, Mirco Musolesi
Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics
Aamal Hussain, Francesco Belardinelli, Georgios Piliouras
BRExIt: On Opponent Modelling in Expert Iteration
Daniel Hernandez, Hendrik Baier, Michael Kaisers
Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
Pei Xu, Junge Zhang, Kaiqi Huang
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning
Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li
Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning
Waldy Joe, Hoong Chuin Lau
Generalization through Diversity: Improving Unsupervised Environment Design
Wenjun Li, Pradeep Varakantham, Dexun Li
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li
Offline RL
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning
Zhe Zhang, Xiaoyang Tan
More for Less: Safe Policy Improvement with Stronger Performance Guarantees
Patrick Wienhoft, Marnix Suilen, Thiago D. Simao, Clemens Dubslaff, Christel Baier, Nils Jansen
Others
Enhancing Network by Reinforcement Learning and Neural Confined Local Search
Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li
On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling
Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning
Lang Qin, Rui Yan, Huajin Tang
Neuro-Symbolic Class Expression Learning
Caglar Demir, Axel-Cyrille Ngonga Ngomo
ScriptWorld: Text Based Environment for Learning Procedural Knowledge
Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi
Representation Learning
Hierarchical State Abstraction based on Structural Information Principles
Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu
Action Space Reduction for Planning Domains
Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi
Sample-efficient RL
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations
Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Daqian Shao, Marta Kwiatkowska
On the Reuse Bias in Off-Policy Reinforcement Learning
Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu
Theory: optimality
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
Pratik Gajane, Peter Auer, Ronald Ortner
Adaptive Estimation Q-learning with Uncertainty and Familiarity
Xiaoyu Gong, Shuai Lu, Jiayu Yu, Sheng Zhu, Zongze Li
表 1 IJCAI 2023 强化学习相关 66 篇论文分类整理

作者: 米祈睿

编辑: 米祈睿,吴帅,张海峰

关于我们:

RLChina是由国内外强化学习学者联合发起的民间学术组织,主要活动包括举办强化学习线上公开课、强化学习线上研讨会等,旨在搭建强化学习学术界、产业界和广大爱好者之间的桥梁。

我们将在微信公众号发布论文解读、学术动态、名家观点等,欢迎大家关注我们!

评论(3)
  • Willing Star 回复
    米祈睿 2023-07-06 08:35:11
    我们先爬取所有accepted papers,然后根据论文内容进行更细致的分类

    感谢 😊

  • 米祈睿 回复
    W Willing Star 2023-07-04 09:17:09

    希望请教一下,以往顶会中 RL 文章分类是作者一篇篇从官网找的吗,还是官网或者国外社区会直接对 RL 文章分类?谢谢

    我们先爬取所有accepted papers,然后根据论文内容进行更细致的分类
  • Willing Star 回复

    希望请教一下,以往顶会中 RL 文章分类是作者一篇篇从官网找的吗,还是官网或者国外社区会直接对 RL 文章分类?谢谢