Safetensors is Joining the PyTorch Foundation
Summary
Safetensors, originally a Hugging Face project, is transitioning to the PyTorch Foundation under the Linux Foundation to establish vendor-neutral, community-driven governance. This move shifts the project's trademark and repository management to a neutral body while maintaining the project's core mission of providing a secure alternative to pickle-based model serialization.
Key Points
- The project's trademark, repository, and governance are moving to the Linux Foundation.
- The file format utilizes a JSON header (with a 100MB limit) for metadata, followed by raw tensor data.
- No breaking changes are being introduced to the file format, existing APIs, or Hugging Face Hub integration.
- Future development includes device-aware loading and saving to enable direct loading onto CUDA and ROCm accelerators, bypassing CPU staging.
- The roadmap includes first-class APIs for Tensor Parallel and Pipeline Parallel loading to allow specific ranks or pipeline stages to load only required weights.
- Planned support is being developed for FP8, sub-byte integer types, and block-quantized formats such as GPTQ and AWQ.
Technical Details
Safetensors is designed to mitigate the security risks associated with the Python pickle format, specifically the execution of arbitrary code during deserialization. The architecture relies on a simple structure: a JSON-encoded header containing tensor metadata, followed by the raw tensor data. This design enables zero-copy loading by mapping tensors directly from disk and supports lazy loading, which allows for the retrieval of individual weights without the need to deserialize the entire checkpoint.
The project is currently working toward integration into PyTorch core as a standard serialization system for torch models. Technical optimizations are focused on improving throughput and memory efficiency through device-aware loading, which allows tensors to be loaded directly onto hardware accelerators like CUDA or ROCm. Furthermore, the project is expanding its support for the evolving quantization landscape, with upcoming formalization of support for block-quantized formats (e.g., GPTQ, AWQ) and various sub-byte integer types.
Impact / Why It Matters
For developers and self-hosters, the transition provides a stable, long-term, and vendor-neutral foundation for model distribution without any disruption to existing workflows or model compatibility. The move to the PyTorch Foundation ensures that the project's evolution is driven by the broader machine learning community rather than a single entity.