MARLlib: A Multi-agent Reinforcement Learning Library# MARLlib Handbook Introduction Installation Framework Environments Quick Start Navigate From RL To MARL Part 1. Single Agent (Deep) RL Reinforcement Learning (RL) Deep Reinforcement Learning (DRL) Resources Part 2. Navigate From RL To MARL MARL: On the shoulder of RL Partially Observable Markov Decision Process (POMDP) Centralized Training & Decentralized Execution (CTDE) Diversity: Task Mode, Interacting Style, and Additional Infomation The Future of MARL Part 3. A Collective Survey of MARL Tasks: Arenas of MARL Methodology of MARL: Task First or Algorithm First Algorithm Documentation Joint Q Learning Family Deep (Recurrent) Q Learning: A Recap IQL: multi-agent version of D(R)QN. VDN: mixing Q with value decomposition network QMIX: mixing Q with monotonic factorization Read List Deep Deterministic Policy Gradient Family Deep Deterministic Policy Gradient: A Recap IDDPG: multi-agent version of DDPG MADDPG: DDPG agent with a centralized Q FACMAC: mixing a bunch of DDPG agents Read List Advanced Actor Critic Family Advanced Actor-Critic: A Recap IA2C: multi-agent version of A2C MAA2C: A2C agent with a centralized critic COMA: MAA2C with Counterfactual Multi-Agent Policy Gradients VDA2C: mixing a bunch of A2C agents’ critics Read List Trust Region Policy Optimization Family Trust Region Policy Optimization: A Recap ITRPO: multi-agent version of TRPO MATRPO: TRPO agent with a centralized critic HATRPO: Sequentially updating critic of MATRPO agents Read List Proximal Policy Optimization Family Proximal Policy Optimization: A Recap IPPO: multi-agent version of PPO MAPPO: PPO agent with a centralized critic VDPPO: mixing a bunch of PPO agents’ critics HAPPO: Sequentially updating critic of MAPPO agents Read List Resources Awesome Paper List Benchmarks