[rllib] Best workflow to train, save, and test agent

What is your question?

This is a great framework, but after reading the documentation and playing around for weeks, I’m still struggeling to get the simple workflow working: Train a PPO agent, save a checkpoint at the end, save stats, and use the trained agent for evaluation or visualization in the end.

It starts with my confusion about the two ways of training an RL agent. Either

trainer = PPOTrainer(env="CartPole-v0", config={"train_batch_size": 4000})
while True:
    print(trainer.train())

Which makes saving my agent simple with trainer.save(path) and I can use the trained agent afterwards for testing with trainer.compute_action(observation). But: Afaik, I cannot change the log directory, which always defaults to ~/ray-results.

Or I use ray.tune.run():

from ray import tune
tune.run(PPOTrainer, config={"env": "CartPole-v0", "train_batch_size": 4000}, local_dir=my_path, checkpoint_at_end=True)

Which allows me to configure a custom local_dir to put my logs in and create a checkpoint at the end. But: Afaik, I don’t have access to my trained agent. ray.tune.run() just returns an ExperimentAnalysis object but not my trained agent nor the exact path of the checkpoints (which includes some random hash) such that I could load the agent. The experiment_id in the results does not correspond to the hash that’s used in the dir name, so I cannot reconstruct the dir name.

My only resort at the moment is to split training with ray.tune.run and then loading and testing the agent into two separate steps, where I have to find and copy & past the path of the last checkpoint manually in between. Very inconvenient.

There must be a more convenient way to do what I want, right?

Ray version and other system information (Python version, TensorFlow version, OS):

Ray 0.8.5
Tensorflow 2.2.0
Python 3.8.3
OS: Ubuntu 20.04 on WSL (Win 10)

Issue Analytics

State:
Created 3 years ago
Reactions:8
Comments:10 (5 by maintainers)

Top GitHub Comments

33reactions

stefanbschneidercommented, Jun 25, 2020

I finally got a workflow that does everything I want; train with configurable log dir, return the saved agent path, load the trained agent, and use it for testing.

Here’s the basic code (within a custom class):

def train(self, stop_criteria):
    """
    Train an RLlib PPO agent using tune until any of the configured stopping criteria is met.
    :param stop_criteria: Dict with stopping criteria.
        See https://docs.ray.io/en/latest/tune/api_docs/execution.html#tune-run
    :return: Return the path to the saved agent (checkpoint) and tune's ExperimentAnalysis object
        See https://docs.ray.io/en/latest/tune/api_docs/analysis.html#experimentanalysis-tune-experimentanalysis
    """
    analysis = ray.tune.run(ppo.PPOTrainer, config=self.config, local_dir=self.save_dir, stop=stop_criteria,
                            checkpoint_at_end=True)
    # list of lists: one list per checkpoint; each checkpoint list contains 1st the path, 2nd the metric value
    checkpoints = analysis.get_trial_checkpoints_paths(trial=analysis.get_best_trial('episode_reward_mean'),
                                                       metric='episode_reward_mean')
    # retriev the checkpoint path; we only have a single checkpoint, so take the first one
    checkpoint_path = checkpoints[0][0]
    return checkpoint_path, analysis

def load(self, path):
    """
    Load a trained RLlib agent from the specified path. Call this before testing a trained agent.
    :param path: Path pointing to the agent's saved checkpoint (only used for RLlib agents)
    """
    self.agent = ppo.PPOTrainer(config=self.config, env=self.env_class)
    self.agent.restore(path)

def test(self):
    """Test trained agent for a single episode. Return the episode reward"""
    # instantiate env class
    env = self.env_class(self.env_config)

    # run until episode ends
    episode_reward = 0
    done = False
    obs = env.reset()
    while not done:
        action = self.agent.compute_action(obs)
        obs, reward, done, info = env.step(action)
        episode_reward += reward

    return episode_reward

With that you can just call train, load, test and it should work. I hope this helps.

Not sure if there’s any other/better way to do it. But it solves my issue.

20reactions

mcrowsoncommented, Jul 4, 2020

I know you closed this issue but this simple workflow in official documentation would be a huge boon.

Top Results From Across the Web

Getting Started with RLlib — Ray 2.2.0 - the Ray documentation

In multi-agent training, the algorithm manages the querying and optimization ... we can train it for a number of episodes (here 10 )...

Serving RLlib Models — Ray 2.2.0

Serving RLlib Models#. In this guide, we will train and deploy a simple Ray RLlib model. In particular, we show: How to train...

Examples — Ray 2.2.0

Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the “cover the area”-problem, thereby learning how to...

How To Customize Policies — Ray 2.2.0

To simplify the definition of policies, RLlib includes Tensorflow and ... The algorithm's training step method defines the distributed training workflow.

Sample Collections and Trajectory Views — Ray 2.2.0

Sample collection process implemented by RLlib: The Policy's model tells ... past episodes (training on a batch), or even trajectories of other agents...