.. _concept: *************************************** Awesome Paper List *************************************** We collect most of the existing MARL algorithms based on the multi-agent environment they choose to conduct on, with tag to annotate the sub-topic. .. contents:: :local: :depth: 3 [B] Basic [S] Information Sharing [RG] Behavior/Role Grouping [I] Imitation [G] Graph [E] Exploration [R] Robust [P] Reward Shaping [F] Offline [T] Tree Search [MT] Multi-task MPE ======================== - `Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments `_ **[B][2017]** - `Learning attentional communication for multi-agent cooperation `_ **[S][2018]** - `learning when to communicate at scale in multiagent cooperative and competitive tasks `_ **[S][2018]** - `Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning `_ **[B][2019]** - `Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient `_ **[R][2019]** - `Tarmac: Targeted multi-agent communication `_ **[S][2019]** - `Learning Individually Inferred Communication for Multi-Agent Cooperation `_ **[S][2020]** - `Multi-Agent Game Abstraction via Graph Attention Neural Network `_ **[G+S][2020]** - `Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning `_ **[E][2020]** - `Robust Multi-Agent Reinforcement Learning with Model Uncertainty `_ **[R][2020]** - `Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning `_ **[B][2020]** - `Weighted QMIX Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning `_ **[B][2020]** - `Cooperative Exploration for Multi-Agent Deep Reinforcement Learning `_ **[E][2021]** - `Multiagent Adversarial Collaborative Learning via Mean-Field Theory `_ **[R][2021]** - `The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games `_ **[B][2021]** - `Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems `_ **[2021]** - `ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind `_ **[2021]** - `Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning `_ **[2021]** - `SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning `_ **[2022]** - `Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning `_ **[2022]** SMAC ======================== - `Value-Decomposition Networks For Cooperative Multi-Agent Learning `_ **[B][2017]** - `Counterfactual Multi-Agent Policy Gradients `_ **[B][2018]** - `Multi-Agent Common Knowledge Reinforcement Learning `_ **[RG+S][2018]** - `QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning `_ **[B][2018]** - `Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control `_ **[S][2019]** - `Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning `_ **[P+E][2019]** - `Learning nearly decomposable value functions via communication minimization `_ **[S][2019]** - `Liir: Learning individual intrinsic reward in multi-agent reinforcement learning `_ **[P][2019]** - `MAVEN: Multi-Agent Variational Exploration `_ **[E][2019]** - `Adaptive learning A new decentralized reinforcement learning approach for cooperative multiagent systems `_ **[B][2020]** - `Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication `_ **[S+G][2020]** - `Deep implicit coordination graphs for multi-agent reinforcement learning `_ **[G][2020]** - `DOP: Off-policy multi-agent decomposed policy gradients `_ **[B][2020]** - `F2a2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning `_ **[B][2020]** - `From few to more Large-scale dynamic multiagent curriculum learning `_ **[MT][2020]** - `Learning structured communication for multi-agent reinforcement learning `_ **[S+G][2020]** - `Learning efficient multi-agent communication: An information bottleneck approach `_ **[S][2020]** - `On the robustness of cooperative multi-agent reinforcement learning `_ **[R][2020]** - `Qatten: A general framework for cooperative multiagent reinforcement learning `_ **[B][2020]** - `Revisiting parameter sharing in multi-agent deep reinforcement learning `_ **[RG][2020]** - `Qplex: Duplex dueling multi-agent q-learning `_ **[B][2020]** - `ROMA: Multi-Agent Reinforcement Learning with Emergent Roles `_ **[RG][2020]** - `Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization `_ **[B][2021]** - `Contrasting centralized and decentralized critics in multi-agent reinforcement learning `_ **[B][2021]** - `Learning in nonzero-sum stochastic games with potentials `_ **[B][2021]** - `Natural emergence of heterogeneous strategies in artificially intelligent competitive teams `_ **[S+G][2021]** - `Rode: Learning roles to decompose multi-agent tasks `_ **[RG][2021]** - `SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning `_ **[B][2021]** - `Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning `_ **[B][2021]** - `The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games `_ **[B][2021]** - `UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers `_ **[MT][2021]** - `Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning `_ **[MT][2021]** - `Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment `_ **[MT][2021]** - `Uneven: Universal value exploration for multi-agent reinforcement learning `_ **[B][2021]** - `Value-decomposition multi-agent actor-critics `_ **[B][2021]** - `RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents `_ **[2021]** - `Regularized Softmax Deep Multi-Agent Q-Learning `_ **[2021]** - `Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods `_ **[2021]** - `ALMA: Hierarchical Learning for Composite Multi-Agent Tasks `_ **[2022]** - `PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning `_ **[2022]** - `Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning `_ **[2022]** - `Surprise Minimizing Multi-Agent Learning with Energy-based Models `_ **[2022]** - `Heterogeneous Skill Learning for Multi-agent Tasks `_ **[2022]** - `SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning `_ **[2022]** - `Self-Organized Group for Cooperative Multi-agent Reinforcement Learning `_ **[2022]** - `ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization `_ **[2022]** - `Efficient Multi-agent Communication via Self-supervised Information Aggregation `_ **[2022]** - `Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration `_ **[2022]** - `CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning `_ **[2022]** MAMuJoCo ======================== - `FACMAC: Factored Multi-Agent Centralised Policy Gradients `_ **[B][2020]** - `Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning `_ **[B][2021]** - `A Game-Theoretic Approach to Multi-Agent Trust Region Optimization `_ **[2021]** - `Settling the Variance of Multi-Agent Policy Gradients `_ **[2021]** - `Graph-Assisted Predictive State Representations for Multi-Agent Partially Observable Systems `_ **[2022]** - `Order Matters: Agent-by-agent Policy Optimization `_ **[2023]** Google Research Football ======================== - `Adaptive Inner-reward Shaping in Sparse Reward Games `_ **[P][2020]** - `Factored action spaces in deep reinforcement learning `_ **[B][2021]** - `Semantic Tracklets An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning `_ **[B][2021]** - `TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations `_ **[F][2021]** - `Celebrating Diversity in Shared Multi-Agent Reinforcement Learning `_ **[2021]** - `Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning `_ **[2022]** Pommerman ======================== - `Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL `_ **[I+T][2018]** - `Accelerating Training in Pommerman with Imitation and Reinforcement Learning `_ **[I][2019]** - `Agent Modeling as Auxiliary Task for Deep Reinforcement Learning `_ **[S][2019]** - `Backplay: man muss immer umkehren `_ **[I][2019]** - `Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning `_ **[B][2019]** - `Adversarial Soft Advantage Fitting Imitation Learning without Policy Optimization `_ **[B][2020]** - `Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination `_ **[B][2020]** LBF & RWARE ======================== - `Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning `_ **[B][2020]** - `Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks `_ **[B][2021]** - `Learning Altruistic Behaviors in Reinforcement Learning without External Rewards `_ **[B][2021]** - `Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing `_ **[RG][2021]** - `LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning `_ **[2022]** MetaDrive ======================== - `Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization `_ **[B][2021]** - `Safe Driving via Expert Guided Policy Optimization `_ **[I][2021]** Hanabi ======================== - `Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning `_ **[B][2019]** - `Re-determinizing MCTS in Hanabi `_ **[S+T][2019]** - `Diverse Agents for Ad-Hoc Cooperation in Hanabi `_ **[B][2019]** - `Joint Policy Search for Multi-agent Collaboration with Imperfect Information `_ **[T][20209]** - `Off-Belief Learning `_ **[B][2021]** - `The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games `_ **[B][2021]** - `2021 Trajectory Diversity for Zero-Shot Coordination `_ **[B][2021]** MAgent ======================== - `Mean field multi-agent reinforcement learning `_ **[B][2018]** - `Graph convolutional reinforcement learning `_ **[B][2018]** - `Factorized q-learning for large-scale multi-agent systems `_ **[B][2019]** - `From few to more Large-scale dynamic multiagent curriculum learning `_ **[MT][2020]** Other Tasks ======================== - `Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning `_ **[2020]** - `Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning `_ **[2021]** - `Learning to Ground Multi-Agent Communication with Autoencoders `_ **[2021]** - `Latent Variable Sequential Set Transformers For Joint Multi-Agent Motion Prediction `_ **[2021]** - `Learning to Share in Multi-Agent Reinforcement Learning `_ **[2021]** - `Resilient Multi-Agent Reinforcement Learning with Adversarial Value Decomposition `_ **[2021]** - `Multi-Agent MDP Homomorphic Networks `_ **[2021]** - `Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks `_ **[2021]** - `Multi-Agent Reinforcement Learning in Stochastic Networked Systems `_ **[2021]** - `Mirror Learning: A Unifying Framework of Policy Optimisation `_ **[2022]** - `E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance `_ **[2022]** - `Shield Decentralization for Safe Multi-Agent Reinforcement Learning `_ **[2022]** - `Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus `_ **[2022]** - `Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning `_ **[2022]** - `Near-Optimal Multi-Agent Learning for Safe Coverage Control `_ **[2022]** - `Multi-agent Dynamic Algorithm Configuration `_ **[2022]** New Environments ======================== - `SMACv2: A New Benchmark for Cooperative Multi-Agent Reinforcement Learning `_ **[2022]** - `MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control `_ **[2022]** - `Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world `_ **[2022]** - `GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation `_ **[2023]**