LLM 0.32a0 is a major backwards-compatible refactor
Summary
LLM 0.32a0 is an alpha release of the LLM Python library that introduces a major, backwards-compatible refactor of its core abstraction. The update shifts the library from a simple text-in/text-out model to a more complex system capable of handling multi-turn message sequences and multi-part, multi-modal streaming responses.
Key Points
- Introduces a sequence-based message input model using
llm.user()andllm.assistant()builder functions. - Implements a multi-part streaming interface via
stream_events()andastream_events()to handle interleaved text, tool calls, and reasoning tokens. - Adds a new serialization mechanism using
response.to_dict()andResponse.from_dict()for custom data persistence. - Provides a
response.reply()method to extend conversations without manual message array construction. - Updates the CLI to support suppressing reasoning tokens via the
-Ror--no-reasoningflag. - Maintains backward compatibility by upgrading single-string prompts to single-item message arrays internally.
Technical Details
The refactor replaces the previous prompt-response abstraction with an architecture designed for modern frontier models. For inputs, the library now supports a list of messages, allowing developers to reconstruct complex conversational histories (e.g., OpenAI-style chat completions) without relying on the library's internal SQLite-based conversation management. For outputs, the streaming API has transitioned from a simple text chunk iterator to an event-based system. This allows the consumption of diverse event types, such as text, tool_call_name, and tool_call_args, which is essential for models that return interleaved reasoning, tool calls, or multi-modal content.
The release also introduces a structured way to handle response persistence. By utilizing a TypedDict defined in llm/serialization.py, developers can use response.to_dict() to generate a JSON-serializable dictionary, which can then be rehydrated using Response.from_dict(). This enables the implementation of alternative storage backends beyond the default SQLite implementation. In the CLI, reasoning tokens are now directed to stderr to prevent interference with piped outputs, and can be explicitly suppressed using the --no-reasoning flag.
Impact / Why It Matters
This update allows developers to build more sophisticated agents and applications that leverage complex model capabilities like tool use and reasoning. It also provides the flexibility to implement custom, scalable storage solutions for conversational data.