Anti-Poaching Environment (APE)
This is the repository for APE, the Anti-Poaching Environment. This is a mixed, zero-sum and multi-agent game between independent poachers and cooperative rangers on a grid. The main implementation can be found at anti_poaching.py, where it is implemented as a PettingZoo environment. Examples that use this environment are found in the (examples)(examples/) directory. Notably, this includes the RLlib (currently supported at v2.8.0) interface in the rllib folder.
Installation
To have a ready-to-go environment created for you, use virtualenv (or similar tools of your choice) to create a python virtual environment. We currently test with python3.8, but later versions should also work.
$ virtualenv -p python3.8 ape;
$ source ape/bin/activate;
To install the environment with a GPU-enabled version of pytorch enabled, you can supply the full
option as follows from the root directory of this project. This will install the environment as an editable package using pip
.
$ pip install -e .[code,gpu] # For GPU-enabled torch
Alternatively, to install only the CPU version of PyTorch, use
$ pip install -e .[code,cpu] # For CPU-only torch
We also provide a simple script that does this automatically for you, as init.sh
. You can simply source this script as follows:
$ source init.sh # For CPU-only torch
$ source init.sh full # For GPU-enabled torch
Using APE
The main environment is implemented in anti_poaching.py, following the PettingZoo API. Once the package is installed (see previous section), the following code should run:
from anti_poaching.anti_poaching_v0 import anti_poaching
cg = anti_poaching.parallel_env(render_mode="rgb")
done, observations, terminations, truncations = False, None, None, None
action_mask = {
agent: cg.grid.permitted_movements(agent) for agent in cg.agents
}
while not done:
# sample the actions for each agent randomly
actions = {
agent: cg.action_space(agent).sample(mask=action_mask[agent])
for agent in cg.agents
}
observations, _, terminations, truncations, _ = cg.step(actions)
action_mask = {
agent: observations[agent]["action_mask"] for agent in cg.agents
}
done = all(
x or y for x, y in zip(terminations.values(), truncations.values())
)
cg.render()
Alternatively, try running the examples from manual_policies, or running the test suite using pytest
as follows
$ pytest [tests/]
Examples and the RLlib Interface
A few examples are found in the examples folder.
Manual policies
The fixed_policy.py and the random_policy.py show how to use the game using hand-coded policies, or just to show the basic RL loop.
Rllib examples
The examples run MARL algorithms (Policy Gradients, PPO, QMIX) on the developed model using RLlib. All experiments can be launched the central script main.py. This runs an RLlib algorithm (PPO) in Multi-Agent Independent Learning mode for an AntiPoachingGame
instance by default. All examples have parameters that can be specified via command line (use --help to see all options); everything is wrapped to provide compatibility with RLlib.
# from the repository root
$ cd examples/rllib
$ python main.py
To see all the configuration options possible, run
$ python main.py --help
For example, to run a 2 Rangers vs. 4 Poachers scenario where
- The game is played on a 15x15 grid
- only the Rangers learn, while the Poachers use the
Random
heuristic, - and the learning is over 30k steps, but is evaluated every 10k steps,
- over 20 available CPU cores
we can run the following line of code:
$ python main.py --grid 15 --rangers 2 --poachers 4
--policies-train r --ppol random
--timesteps 30000 --eval-every 10000
--num-cpus 20
For further details, refer the README for the RLlib interface.