Add training doc signposting to TRL (#14439)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 08:35:07 +01:00 · 2025-03-08 08:35:07 +01:00 · be0b399d74
commit be0b399d74
parent b8b0ccbd2d
2 changed files with 21 additions and 0 deletions
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -100,6 +100,14 @@ features/compatibility_matrix
 % Details about running vLLM
 :::{toctree}
 :caption: Training
 :maxdepth: 1
 training/trl.md
 :::
 :::{toctree}
 :caption: Inference and Serving
 :maxdepth: 1
--- a/docs/source/training/trl.md
+++ b/docs/source/training/trl.md
@ -0,0 +1,13 @@
 # Transformers Reinforcement Learning
 Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
 Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
 See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
 :::{seealso}
 For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
 - [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
 - [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
 :::