Train AI models with Unsloth and Hugging Face Jobs for FREE
Summary
Hugging Face Jobs, integrated with Unsloth, provides a managed cloud GPU environment for fine-tuning small language models (SLMs). This workflow can be automated using coding agents like Claude Code and Codex through specialized Hugging Face skills, streamlining the process from script generation to model deployment on the Hugging Face Hub.
Key Points
- Unsloth integration enables approximately 2x faster training and a 60% reduction in VRAM usage compared to standard fine-tuning methods.
- The workflow is optimized for small models such as LiquidAI/LFM2.5-1.2B-Instruct, which requires less than 1GB of memory and is suitable for on-device deployment.
- Training is executed via the
hfCLI using thehf jobs uv runcommand, allowing for specific GPU flavor selection (e.g.,t4-small,a10g-small). - Estimated GPU costs range from ~$0.40/hr for
t4-small(models <1B parameters) to ~$3.00/hr fora10g-large(models 7-13B parameters). - Coding agents can automate the pipeline by installing the
hugging-face-model-trainerskill via plugin marketplaces (Claude Code) or$skill-installer(Codex).
Technical Details
The training architecture relies on Hugging Face Jobs, a managed service that executes training scripts on cloud GPUs. The process utilizes uv to run scripts with inline dependencies, typically including unsloth, trl, datasets, and trackio. When using coding agents, the agent generates a training script, submits the job via the hf CLI, and pushes the resulting weights to a specified Hugging Face Hub repository.
For Claude Code users, the integration requires adding the huggingface/skills marketplace and installing the hugging-face-model-trainer@huggingface-skills plugin. For Codex, skills are managed via the .agents/skills/ directory or the $skill-installer utility. The system supports real-time monitoring of training progress, such as loss curves, through Trackio integration.
Impact / Why It Matters
This integration significantly reduces the computational and operational overhead of fine-tuning by automating GPU orchestration and optimizing memory usage. It enables developers to efficiently iterate on high-performance, small-scale models designed for edge and mobile deployment.