about « all posts

Posts In #qlora

Post-training a Mixture-of-Experts Language Model with Reinforcement Learning

Jun 1 2026 · 11 min read
#reinforcement-learning #large-language-models #mixture-of-experts #lora #qlora #gsm8k
Table of Contents

Introduction

Large Language Models (LLMs) are everywhere and are substantially changing the way we perform many daily tasks. Recently, I noticed a growing number of positions related to the post-training of such models. Interestingly, two aspects …

Read More…