Generating CUDA Kernels with Slime RL

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

TO-DO-List:

Env Setup

Docker/appatiner is used for setup on NCSA Delta.

1. Download the image from dockerhub and convert to apptainer format

apptainer pull slime.sif docker://slimerl/slime:latest

2. Request a GPU-interactive allocation, assh into the GPU, and run the container

salloc --mem=220g --nodes=1 --ntasks-per-node=4 --cpus-per-task=4 --partition=gpuA100x4-interactive --account=bekz-delta-gpu --time=00:30:00 --gpus-per-node=4

ssh gpuaxxx

apptainer run --nv --bind /work/nvme/bekz/yzhao25/huggingface:/mnt/huggingface \ # bind huggingface cache path
                   --bind /work/nvme/bcrc/yzhao25/rl_datasets:/mnt/datasets \ # bind the datasets and model path
                   /u/yzhao25/slime/slime.sif \
                   /bin/bash --login

Example 1: Run Qwen3-4B with GRPO on DAPO-MATH

1. Download models and datasets

huggingface-cli download Qwen/Qwen3-4B --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/Qwen3-4B

huggingface-cli download --repo-type dataset zhuzilin/dapo-math-17k --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/dapo-math-17k

huggingface-cli download --repo-type dataset zhuzilin/aime-2024 --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/aime-2024

2. Convert models to megatron format

source scripts/models/qwen3-4B.sh

CUDA_DEVICE_MAX_CONNECTIONS=1 PYTHONPATH=/root/Megatron-LM torchrun --nproc_per_node=4 \
  tools/convert_hf_to_torch_dist.py \
  ${MODEL_ARGS[@]} \
  --tensor-model-parallel-size 2 \ # this should be consistent with your PERF_ARGS
  --pipeline-model-parallel-size 2 \
  --hf-checkpoint /mnt/datasets/Qwen3-4B \
  --make-vocab-size-divisible-by 1 \
  --save /mnt/datasets/qwen3_4b_torch_dist_tp2

3. Run GRPO on Qwen3-4B

bash scripts/run-qwen3-4B.sh 2>&1 | tee run.log

4. Alternatively, you can submit slurm jobs by running:

sbatch scripts/slurm_scripts/run_qwen3_4b_grpo.slurm

Example 2: Run Qwen3-4B with SFT on Open-Hermes

1. You can follow qwen3-4b-base-openhermes.md to download model and datasets

2. Run SFT on Qwen3-4B

bash scripts/run-qwen3-4B-base-sft.sh 2>&1 | tee run_sft.log

3. Alternatively, you can submit slurm jobs by running:

sbatch scripts/slurm_scripts/run_qwen3_4b_sft.slurm

Arguments Walkthrough

Arguments in slime are divided into three categories:

Megatron arguments: slime reads all arguments in Megatron. You can configure Megatron by passing arguments like --tensor-model-parallel-size 2.
SGLang arguments: All arguments for the installed SGLang are supported. These arguments must be prefixed with --sglang-. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.
slime-specific arguments: Please refer to: slime/utils/arguments.py

For complete usage instructions, please refer to the Usage Documentation.

FAQ & Acknowledgements

For frequently asked questions, please see the Q&A
Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.
To quote slime, please use:

@misc{slime_github,
  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
  title        = {slime: An LLM post-training framework for RL Scaling},
  year         = {2025},
  howpublished = {\url{https://github.com/THUDM/slime}},
  note         = {GitHub repository. Corresponding author: Xin Lv},
  urldate      = {2025-06-19}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,282 Commits
.github		.github
docker		docker
docs		docs
examples		examples
imgs		imgs
scripts		scripts
slime		slime
slime_plugins		slime_plugins
tests		tests
tools		tools
torch2cuda		torch2cuda
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py
train_async.py		train_async.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating CUDA Kernels with Slime RL

TO-DO-List:

Env Setup

1. Download the image from dockerhub and convert to apptainer format

2. Request a GPU-interactive allocation, assh into the GPU, and run the container

Example 1: Run Qwen3-4B with GRPO on DAPO-MATH

1. Download models and datasets

2. Convert models to megatron format

3. Run GRPO on Qwen3-4B

4. Alternatively, you can submit slurm jobs by running:

Example 2: Run Qwen3-4B with SFT on Open-Hermes

1. You can follow qwen3-4b-base-openhermes.md to download model and datasets

2. Run SFT on Qwen3-4B

3. Alternatively, you can submit slurm jobs by running:

Arguments Walkthrough

FAQ & Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Generating CUDA Kernels with Slime RL

TO-DO-List:

Env Setup

1. Download the image from dockerhub and convert to apptainer format

2. Request a GPU-interactive allocation, assh into the GPU, and run the container

Example 1: Run Qwen3-4B with GRPO on DAPO-MATH

1. Download models and datasets

2. Convert models to megatron format

3. Run GRPO on Qwen3-4B

4. Alternatively, you can submit slurm jobs by running:

Example 2: Run Qwen3-4B with SFT on Open-Hermes

1. You can follow qwen3-4b-base-openhermes.md to download model and datasets

2. Run SFT on Qwen3-4B

3. Alternatively, you can submit slurm jobs by running:

Arguments Walkthrough

FAQ & Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages