MARLlib
latest
MARLlib Handbook
Introduction
Installation
Framework
Environments
Quick Start
Navigate From RL To MARL
Part 1. Single Agent (Deep) RL
Part 2. Navigate From RL To MARL
Part 3. A Collective Survey of MARL
Algorithm Documentation
Joint Q Learning Family
Deep Deterministic Policy Gradient Family
Advanced Actor Critic Family
Trust Region Policy Optimization Family
Proximal Policy Optimization Family
Resources
Awesome Paper List
Existing Benchmarks
MARLlib
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library
Edit on GitHub
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library
MARLlib Handbook
Introduction
Installation
Framework
Environments
Quick Start
Navigate From RL To MARL
Part 1. Single Agent (Deep) RL
Reinforcement Learning (RL)
Deep Reinforcement Learning (DRL)
Resources
Part 2. Navigate From RL To MARL
MARL: On the shoulder of RL
Partially Observable Markov Decision Process (POMDP)
Centralized Training & Decentralized Execution (CTDE)
Diversity: Task Mode, Interacting Style, and Additional Infomation
What can MARL do?
Part 3. A Collective Survey of MARL
Tasks: Arenas of MARL
Methodology of MARL: Task First or Algorithm First
Algorithm Documentation
Joint Q Learning Family
Deep (Recurrent) Q Learning: A Recap
IQL: multi-agent version of D(R)QN.
VDN: mixing Q with value decomposition network
QMIX: mixing Q with monotonic factorization
Read List
Deep Deterministic Policy Gradient Family
Deep Deterministic Policy Gradient: A Recap
IDDPG: multi-agent version of DDPG
MADDPG: DDPG agent with a centralized Q
FACMAC: mixing a bunch of DDPG agents
Read List
Advanced Actor Critic Family
Advanced Actor-Critic: A Recap
IA2C: multi-agent version of A2C
MAA2C: A2C agent with a centralized critic
COMA: MAA2C with Counterfactual Multi-Agent Policy Gradients
VDA2C: mixing a bunch of A2C agents’ critics
Read List
Trust Region Policy Optimization Family
Trust Region Policy Optimization: A Recap
ITRPO: multi-agent version of TRPO
MATRPO: TRPO agent with a centralized critic
HATRPO: Sequentially updating critic of MATRPO agents
Read List
Proximal Policy Optimization Family
Proximal Policy Optimization: A Recap
IPPO: multi-agent version of PPO
MAPPO: PPO agent with a centralized critic
VDPPO: mixing a bunch of PPO agents’ critics
HAPPO: Sequentially updating critic of MAPPO agents
Read List
Resources
Awesome Paper List
Existing Benchmarks
Read the Docs
v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds