
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Mar 18, 2025 · Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical …
[2502.16982] Muon is Scalable for LLM Training
Feb 24, 2025 · Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not …
This requirement emphasizes the need for researchers developing LLMs to possess significant engineering capabilities in addressing the challenges encountered during LLM development. …
Timely survey papers systematically summarize the progress of LLM-based agents, as seen in works [Xi et al., 2023; Wang et al., 2023b]. Based on the inspiring capabilities of the single LLM-based agent, …
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Feb 14, 2025 · Despite notable advancements in Multimodal Large Language Models (MLLMs), most state-of-the-art models have not undergone thorough alignment with human preferences. This gap …
[2412.04315] Densing Law of LLMs
Dec 5, 2024 · To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these …
[2501.01005] FlashInfer: Efficient and Customizable Attention ...
Jan 2, 2025 · Transformers, driven by attention mechanisms, form the foundation of large language models (LLMs). As these models scale up, efficient GPU attention kernels become essential for high …