TRL - Transformer Reinforcement Learning

A comprehensive library to post-train foundation models using techniques like Supervised Fine-Tuning (SFT), Reinforcement Learning (RL).

Quick Setup

1. Clone the Repository

git clone https://github.com/AmirTuring/trl.git
cd trl

2. Run Setup Script

sh setup.sh

This will:

Install tmux (if not already installed)
Set up a Python virtual environment
Install all required dependencies
Install flash-attention if CUDA is available

3. Configure Accelerate

Choose the appropriate configuration based on your hardware:

# For MacBook / Local Development (CPU/MPS)
cp examples/accelerate_configs/macbook_single.yaml accelerate_config.yaml

# For single GPU training
cp examples/accelerate_configs/single_gpu.yaml accelerate_config.yaml

# For multi-GPU training (8 GPUs)
cp examples/accelerate_configs/multi_gpu.yaml accelerate_config.yaml

# For DeepSpeed ZeRO Stage 2 (recommended for larger models)
cp examples/accelerate_configs/deepspeed_zero2.yaml accelerate_config.yaml

4. Auto-Shutdown (Optional - For RunPod Pods)

To save costs on cloud platforms like RunPod, you can enable automatic shutdown after GPU inactivity:

# Start the auto-shutdown monitor in background
nohup sh auto_shutdown.sh &

# To stop the auto-shutdown monitor
pkill -f auto_shutdown.sh

The script monitors GPU utilization and automatically stops the pod after 10 minutes of idle time (below 5% GPU usage). You can configure these thresholds by editing auto_shutdown.sh.

Requirements: nvidia-smi, runpodctl, and RUNPOD_POD_ID environment variable must be set.

5. Create a tmux Session (Recommended)

For long-running training jobs, it's recommended to use tmux to keep your training running even if your connection drops:

# Create a new tmux session named "train"
tmux new -s train

# Your training commands will run inside this session
# To detach from the session (keep it running in background): Ctrl+b then d

# To reattach to the session later
tmux attach -t train

# To list all tmux sessions
tmux ls

# To kill the session when done
tmux kill-session -t train

Usage

GRPO Math Training

accelerate launch --config_file accelerate_config.yaml examples/scripts/grpo_math.py --config examples/cli_configs/grpo_math_config.yaml

Reward Model Training

accelerate launch --config_file accelerate_config.yaml examples/scripts/reward_trainer.py --config examples/cli_configs/reward_lora_config.yaml

Router Training

accelerate launch --config_file accelerate_config.yaml examples/scripts/router_trainer.py --config examples/cli_configs/router_config.yaml

License

This repository's source code is available under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1,891 Commits
.github		.github
commands		commands
docker		docker
docs/source		docs/source
examples		examples
prompts		prompts
reward_funcs		reward_funcs
scripts		scripts
tests		tests
trl		trl
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
VERSION		VERSION
auto_shutdown.sh		auto_shutdown.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TRL - Transformer Reinforcement Learning

Quick Setup

1. Clone the Repository

2. Run Setup Script

3. Configure Accelerate

4. Auto-Shutdown (Optional - For RunPod Pods)

5. Create a tmux Session (Recommended)

Usage

GRPO Math Training

Reward Model Training

Router Training

License

About

Uh oh!

Releases

Packages

Languages

License

AmirTuring/trl

Folders and files

Latest commit

History

Repository files navigation

TRL - Transformer Reinforcement Learning

Quick Setup

1. Clone the Repository

2. Run Setup Script

3. Configure Accelerate

4. Auto-Shutdown (Optional - For RunPod Pods)

5. Create a tmux Session (Recommended)

Usage

GRPO Math Training

Reward Model Training

Router Training

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages