Quick Start#

If you have not installed MARLlib yet, please refer to Installation before running.

Configuration Overview#

../_images/configurations.png

Prepare all the configuration files to start your MARL journey#

To start your MARL journey with MARLlib, you need to prepare all the configuration files to customize the whole learning pipeline. There are four configuration files that you need to ensure correctness for your training demand:

  • scenario: specify your environment/task settings

  • algorithm: finetune your algorithm hyperparameters

  • model: customize the model architecture

  • ray/rllib: changing the basic training settings

Scenario Configuration#

MARLlib provides ten environments for you to conduct your experiment. You can follow the instruction in the Environments section to install them and change the corresponding configuration to customize the chosen task.

Algorithm Hyper-parameter#

After setting up the environment, you need to visit the MARL algorithms’ hyper-parameter directory. Each algorithm has different hyper-parameters that you can finetune. Most of the algorithms are sensitive to the environment settings. Therefore, you need to give a set of hyper-parameters that fit the current MARL task.

We provide a commonly used hyper-parameters directory, a test-only hyper-parameters directory, and a finetuned hyper-parameters sets for the three most used MARL environments, including SMAC , MPE, and MAMuJoCo

Model Architecture#

Observation space varies with different environments. MARLlib automatically constructs the agent model to fit the diverse input shape, including: observation, global state, action mask, and additional information (e.g., minimap)

However, you can still customize your model in model’s config. The supported architecture change includes:

Ray/RLlib Running Options#

Ray/RLlib provides a flexible multi-processing scheduling mechanism for MARLlib. You can modify the file of ray configuration to adjust sampling speed (worker number, CPU number), training speed (GPU acceleration), running mode (locally or distributed), parameter sharing strategy (all, group, individual), and stop condition (iteration, reward, timestep).

How to Customize#

To modify the configuration settings in MARLlib, it is important to first understand the underlying configuration system.

Level of the configuration#

There are three levels of configuration, listed here in order of priority from low to high:

  • File-based configuration, which includes all the default *.yaml files.

  • API-based customized configuration, which allows users to specify their own preferences, such as {"core_arch": "mlp", "encode_layer": "128-256"}.

  • Command line arguments, such as python xxx.py --ray_args.local_mode --env_args.difficulty=6 --algo_args.num_sgd_iter=6.

If a parameter is set at multiple levels, the higher level configuration will take precedence over the lower level configuration.

Compatibility across different levels#

It is important to ensure that hyperparameter choices are compatible across different levels. For example, the Multiple Particle Environments (MPE) support both discrete and continuous actions. To enable continuous action space settings, one can simply change the continuous_actions parameter in the mpe.yaml to True. It is important to pay attention to the corresponding setting when using the API-based approach or command line arguments, such as marl.make_env(xxxx, continuous_actions=True), where the argument name must match the one in mpe.yaml exactly.

Training#

from marllib import marl
# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread")
# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source="mpe")
# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "mlp", "encode_layer": "128-256"})
# start training
mappo.fit(env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group")

prepare the environment#

task mode

api example

cooperative

marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)

collaborative

marl.make_env(environment_name="mpe", map_name="simple_spread")

competitive

marl.make_env(environment_name="mpe", map_name="simple_adversary")

mixed

marl.make_env(environment_name="mpe", map_name="simple_crypto")

Most of the popular environments in MARL research are supported by MARLlib:

Env Name

Learning Mode

Observability

Action Space

Observations

LBF

cooperative + collaborative

Both

Discrete

1D

RWARE

cooperative

Partial

Discrete

1D

MPE

cooperative + collaborative + mixed

Both

Both

1D

SMAC

cooperative

Partial

Discrete

1D

MetaDrive

collaborative

Partial

Continuous

1D

MAgent

collaborative + mixed

Partial

Discrete

2D

Pommerman

collaborative + competitive + mixed

Both

Discrete

2D

MAMuJoCo

cooperative

Partial

Continuous

1D

Google Research Football

collaborative + mixed

Full

Discrete

2D

Hanabi

cooperative

Partial

Discrete

1D

Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and important notes.

initialize the algorithm#

running target

api example

train & finetune

marl.algos.mappo(hyperparam_source=$ENV)

develop & debug

marl.algos.mappo(hyperparam_source="test")

3rd party env

marl.algos.mappo(hyperparam_source="common")

Here is a chart describing the characteristics of each algorithm:

algorithm

support task mode

discrete action

continuous action

policy type

IQL: multi-agent version of D(R)QN.

all four

heavy_check_mark:

off-policy

IPG

all four

heavy_check_mark:

heavy_check_mark:

on-policy

IA2C: multi-agent version of A2C

all four

heavy_check_mark:

heavy_check_mark:

on-policy

IDDPG: multi-agent version of DDPG

all four

heavy_check_mark:

off-policy

ITRPO: multi-agent version of TRPO

all four

heavy_check_mark:

heavy_check_mark:

on-policy

IPPO: multi-agent version of PPO

all four

heavy_check_mark:

heavy_check_mark:

on-policy

COMA: MAA2C with Counterfactual Multi-Agent Policy Gradients

all four

heavy_check_mark:

on-policy

MADDPG: DDPG agent with a centralized Q

all four

heavy_check_mark:

off-policy

MAA2C: A2C agent with a centralized critic

all four

heavy_check_mark:

heavy_check_mark:

on-policy

MATRPO: TRPO agent with a centralized critic

all four

heavy_check_mark:

heavy_check_mark:

on-policy

MAPPO: PPO agent with a centralized critic

all four

heavy_check_mark:

heavy_check_mark:

on-policy

HATRPO: Sequentially updating critic of MATRPO agents

cooperative

heavy_check_mark:

heavy_check_mark:

on-policy

HAPPO: Sequentially updating critic of MAPPO agents

cooperative

heavy_check_mark:

heavy_check_mark:

on-policy

VDN: mixing Q with value decomposition network

cooperative

heavy_check_mark:

off-policy

QMIX: mixing Q with monotonic factorization

cooperative

heavy_check_mark:

off-policy

FACMAC: mixing a bunch of DDPG agents

cooperative

heavy_check_mark:

off-policy

VDA2C: mixing a bunch of A2C agents’ critics

cooperative

heavy_check_mark:

heavy_check_mark:

on-policy

VDPPO: mixing a bunch of PPO agents’ critics

cooperative

heavy_check_mark:

heavy_check_mark:

on-policy

*all four: cooperative collaborative competitive mixed

construct the agent model#

model arch

api example

MLP

marl.build_model(env, algo, {"core_arch": "mlp")

GRU

marl.build_model(env, algo, {"core_arch": "gru"})

LSTM

marl.build_model(env, algo, {"core_arch": "lstm"})

encoder arch

marl.build_model(env, algo, {"core_arch": "gru", "encode_layer": "128-256"})

kick off the training algo.fit#

setting

api example

train

algo.fit(env, model)

debug

algo.fit(env, model, local_mode=True)

stop condition

algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000})

policy sharing

algo.fit(env, model, share_policy='all') # or 'group' / 'individual'

save model

algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True)

GPU accelerate

algo.fit(env, model, local_mode=False, num_gpus=1)

CPU accelerate

algo.fit(env, model, local_mode=False, num_workers=5)

policy inference algo.render

setting

api example

render

algo.render(env, model, local_mode=True, restore_path='path_to_model')

By default, all the models will be saved at /home/username/ray_results/experiment_name/checkpoint_xxxx

Logging & Saving#

MARLlib uses the default logger provided by Ray in ray.tune.CLIReporter. You can change the saved log location here.