# Awesome Paper List#

We collect most of the existing MARL algorithms based on the multi-agent environment they choose to conduct on, with tag to annotate the sub-topic.

[B] Basic [S] Information Sharing [RG] Behavior/Role Grouping [I] Imitation [G] Graph [E] Exploration [R] Robust [P] Reward Shaping [F] Offline [T] Tree Search [MT] Multi-task

## MPE#

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

**[B][2017]**Learning attentional communication for multi-agent cooperation

**[S][2018]**learning when to communicate at scale in multiagent cooperative and competitive tasks

**[S][2018]**Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning

**[B][2019]**Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

**[R][2019]**Learning Individually Inferred Communication for Multi-Agent Cooperation

**[S][2020]**Multi-Agent Game Abstraction via Graph Attention Neural Network

**[G+S][2020]**Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning

**[E][2020]**Robust Multi-Agent Reinforcement Learning with Model Uncertainty

**[R][2020]**Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

**[B][2020]**Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

**[E][2021]**Multiagent Adversarial Collaborative Learning via Mean-Field Theory

**[R][2021]**The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

**[B][2021]**Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

**[2021]**ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind

**[2021]**SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

**[2022]**Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

**[2022]**

## SMAC#

Value-Decomposition Networks For Cooperative Multi-Agent Learning

**[B][2017]**Multi-Agent Common Knowledge Reinforcement Learning

**[RG+S][2018]**QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

**[B][2018]**Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

**[S][2019]**Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

**[P+E][2019]**Learning nearly decomposable value functions via communication minimization

**[S][2019]**Liir: Learning individual intrinsic reward in multi-agent reinforcement learning

**[P][2019]**Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication

**[S+G][2020]**Deep implicit coordination graphs for multi-agent reinforcement learning

**[G][2020]**DOP: Off-policy multi-agent decomposed policy gradients

**[B][2020]**From few to more Large-scale dynamic multiagent curriculum learning

**[MT][2020]**Learning structured communication for multi-agent reinforcement learning

**[S+G][2020]**Learning efficient multi-agent communication: An information bottleneck approach

**[S][2020]**On the robustness of cooperative multi-agent reinforcement learning

**[R][2020]**Qatten: A general framework for cooperative multiagent reinforcement learning

**[B][2020]**Revisiting parameter sharing in multi-agent deep reinforcement learning

**[RG][2020]**ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

**[RG][2020]**Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization

**[B][2021]**Contrasting centralized and decentralized critics in multi-agent reinforcement learning

**[B][2021]**Learning in nonzero-sum stochastic games with potentials

**[B][2021]**Natural emergence of heterogeneous strategies in artificially intelligent competitive teams

**[S+G][2021]**Rode: Learning roles to decompose multi-agent tasks

**[RG][2021]**SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning

**[B][2021]**Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

**[B][2021]**The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

**[B][2021]**UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

**[MT][2021]**Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

**[MT][2021]**Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

**[MT][2021]**Uneven: Universal value exploration for multi-agent reinforcement learning

**[B][2021]**RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

**[2021]**Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

**[2021]**ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

**[2022]**Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning

**[2022]**Surprise Minimizing Multi-Agent Learning with Energy-based Models

**[2022]**SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

**[2022]**Self-Organized Group for Cooperative Multi-agent Reinforcement Learning

**[2022]**Efficient Multi-agent Communication via Self-supervised Information Aggregation

**[2022]**Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

**[2022]**CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

**[2022]**

## MAMuJoCo#

FACMAC: Factored Multi-Agent Centralised Policy Gradients

**[B][2020]**Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

**[B][2021]**A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

**[2021]**Settling the Variance of Multi-Agent Policy Gradients

**[2021]**Graph-Assisted Predictive State Representations for Multi-Agent Partially Observable Systems

**[2022]**

## Google Research Football#

Adaptive Inner-reward Shaping in Sparse Reward Games

**[P][2020]**Factored action spaces in deep reinforcement learning

**[B][2021]**Semantic Tracklets An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

**[B][2021]**TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations

**[F][2021]**Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

**[2021]**Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

**[2022]**

## Pommerman#

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

**[I+T][2018]**Accelerating Training in Pommerman with Imitation and Reinforcement Learning

**[I][2019]**Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

**[S][2019]**Backplay: man muss immer umkehren

**[I][2019]**Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

**[B][2019]**Adversarial Soft Advantage Fitting Imitation Learning without Policy Optimization

**[B][2020]**Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

**[B][2020]**

## LBF & RWARE#

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

**[B][2020]**Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

**[B][2021]**Learning Altruistic Behaviors in Reinforcement Learning without External Rewards

**[B][2021]**Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing

**[RG][2021]**LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

**[2022]**

## MetaDrive#

## Hanabi#

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

**[B][2019]**Re-determinizing MCTS in Hanabi

**[S+T][2019]**Joint Policy Search for Multi-agent Collaboration with Imperfect Information

**[T][20209]**Off-Belief Learning

**[B][2021]**The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

**[B][2021]**2021 Trajectory Diversity for Zero-Shot Coordination

**[B][2021]**

## MAgent#

## Other Tasks#

Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning

**[2020]**Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

**[2021]**Learning to Ground Multi-Agent Communication with Autoencoders

**[2021]**Latent Variable Sequential Set Transformers For Joint Multi-Agent Motion Prediction

**[2021]**Learning to Share in Multi-Agent Reinforcement Learning

**[2021]**Resilient Multi-Agent Reinforcement Learning with Adversarial Value Decomposition

**[2021]**Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks

**[2021]**Multi-Agent Reinforcement Learning in Stochastic Networked Systems

**[2021]**Mirror Learning: A Unifying Framework of Policy Optimisation

**[2022]**E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance

**[2022]**Shield Decentralization for Safe Multi-Agent Reinforcement Learning

**[2022]**Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

**[2022]**Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning

**[2022]**Near-Optimal Multi-Agent Learning for Safe Coverage Control

**[2022]**