Skip to content

kennethzhao24/slime-torch2cuda

 
 

Repository files navigation

Generating CUDA Kernels with Slime RL

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

  1. High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
  2. Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

TO-DO-List:

  • Generator
    • Dataset Prep
    • Convert to slime Format
  • GRPO
  • SFT

Env Setup

Docker/appatiner is used for setup on NCSA Delta.

1. Download the image from dockerhub and convert to apptainer format

apptainer pull slime.sif docker://slimerl/slime:latest

2. Request a GPU-interactive allocation, assh into the GPU, and run the container

salloc --mem=220g --nodes=1 --ntasks-per-node=4 --cpus-per-task=4 --partition=gpuA100x4-interactive --account=bekz-delta-gpu --time=00:30:00 --gpus-per-node=4

ssh gpuaxxx

apptainer run --nv --bind /work/nvme/bekz/yzhao25/huggingface:/mnt/huggingface \ # bind huggingface cache path
                   --bind /work/nvme/bcrc/yzhao25/rl_datasets:/mnt/datasets \ # bind the datasets and model path
                   /u/yzhao25/slime/slime.sif \
                   /bin/bash --login

Example 1: Run Qwen3-4B with GRPO on DAPO-MATH

1. Download models and datasets

huggingface-cli download Qwen/Qwen3-4B --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/Qwen3-4B

huggingface-cli download --repo-type dataset zhuzilin/dapo-math-17k --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/dapo-math-17k

huggingface-cli download --repo-type dataset zhuzilin/aime-2024 --local-dir /work/nvme/bcrc/yzhao25/rl_datasets/aime-2024

2. Convert models to megatron format

source scripts/models/qwen3-4B.sh

CUDA_DEVICE_MAX_CONNECTIONS=1 PYTHONPATH=/root/Megatron-LM torchrun --nproc_per_node=4 \
  tools/convert_hf_to_torch_dist.py \
  ${MODEL_ARGS[@]} \
  --tensor-model-parallel-size 2 \ # this should be consistent with your PERF_ARGS
  --pipeline-model-parallel-size 2 \
  --hf-checkpoint /mnt/datasets/Qwen3-4B \
  --make-vocab-size-divisible-by 1 \
  --save /mnt/datasets/qwen3_4b_torch_dist_tp2

3. Run GRPO on Qwen3-4B

bash scripts/run-qwen3-4B.sh 2>&1 | tee run.log

4. Alternatively, you can submit slurm jobs by running:

sbatch scripts/slurm_scripts/run_qwen3_4b_grpo.slurm

Example 2: Run Qwen3-4B with SFT on Open-Hermes

1. You can follow qwen3-4b-base-openhermes.md to download model and datasets

2. Run SFT on Qwen3-4B

bash scripts/run-qwen3-4B-base-sft.sh 2>&1 | tee run_sft.log

3. Alternatively, you can submit slurm jobs by running:

sbatch scripts/slurm_scripts/run_qwen3_4b_sft.slurm

Arguments Walkthrough

Arguments in slime are divided into three categories:

  1. Megatron arguments: slime reads all arguments in Megatron. You can configure Megatron by passing arguments like --tensor-model-parallel-size 2.
  2. SGLang arguments: All arguments for the installed SGLang are supported. These arguments must be prefixed with --sglang-. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.
  3. slime-specific arguments: Please refer to: slime/utils/arguments.py

For complete usage instructions, please refer to the Usage Documentation.

FAQ & Acknowledgements

  • For frequently asked questions, please see the Q&A
  • Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.
  • To quote slime, please use:
@misc{slime_github,
  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
  title        = {slime: An LLM post-training framework for RL Scaling},
  year         = {2025},
  howpublished = {\url{https://github.com/THUDM/slime}},
  note         = {GitHub repository. Corresponding author: Xin Lv},
  urldate      = {2025-06-19}
}

About

Post-training framework for RL Scaling.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 89.8%
  • Shell 8.9%
  • Other 1.3%