★ 6/10 · Ai · 2026-03-09

LeRobot v0.5.0: Scaling Every Dimension

LeRobot v0.5.0 expands the framework's capabilities from tabletop manipulation to full-body humanoid control with the addition of Unitree G1 support. The release also introduces advanced VLA policies, optimized data...

LeRobot v0.5.0: Scaling Every Dimension

Summary

LeRobot v0.5.0 expands the framework's capabilities from tabletop manipulation to full-body humanoid control with the addition of Unitree G1 support. The release also introduces advanced VLA policies, optimized data pipelines with streaming encoding, and remote environment loading via EnvHub.

Key Points

  • Full integration of the Unitree G1 humanoid, supporting locomotion, manipulation, teleoperation, and Whole-Body Control (WBC).
  • Introduction of new VLA policies: Pi0-FAST (Gemma 300M-based), Wall-X (Qwen2.5-VL), and X-VLA (Florence2-based).
  • Implementation of Real-Time Chunking (RTC) to improve inference responsiveness in flow-matching policies (Pi0, SmolVLA, and Diffusion).
  • Performance optimizations achieving 10x faster image training and 3x faster video encoding via parallel encoding.
  • Introduction of EnvHub for loading and registering Gymnasium environments directly from the Hugging Face Hub using HubEnvConfig.
  • Mandatory upgrade to Python 3.12+ and migration to Hugging Face Transformers v5.
  • Expanded hardware support for CAN bus motor controllers, including RobStride and Damiao.

Technical Details

The release introduces significant architectural shifts in model execution and data handling. The Pi0-FAST policy utilizes Frequency-space Action Sequence Tokenization (FAST) for autoregressive decoding, while the RTC enhancement can be enabled via --policy.rtc_config.enabled=true to allow flow-matching models to blend new predictions with in-progress actions, reducing latency during deployment. For data collection, the LeRobotDataset.create API now supports streaming_encoding=True, which utilizes hardware-accelerated encoders to encode frames in real-time, effectively eliminating the wait time between recorded episodes.

On the hardware and simulation side, the codebase has been unified to support bi-manual configurations for the SO-100/SO-101 series and expanded to include mobile robotics via the Earth Rover integration. The integration of NVIDIA IsaacLab-Arena provides GPU-accelerated simulation for high-throughput reinforcement learning. Additionally, the framework now supports PEFT (Parameter-Efficient Fine-Tuning) methods, such as LoRA, allowing developers to adapt large VLAs to specific tasks via the --policy.peft_config.use_peft=true configuration.

Impact / Why It Matters

This update enables developers to scale robot learning from simple single-arm setups to complex humanoid and mobile platforms using a modernized, high-performance codebase. The improvements in data throughput and inference responsiveness significantly reduce the computational and temporal overhead required to train and deploy large-scale vision-language-action models.

AI Robotics Machine Learning

↳ Sources