How to build scalable web apps with OpenAI's Privacy Filter
Summary
This entry details the implementation of scalable web applications using OpenAI's Privacy Filter and gradio.Server. It demonstrates how to integrate a 1.5B-parameter PII detection model into custom HTML/JS frontends while leveraging Gradio's queueing, ZeroGPU allocation, and dual-SDK compatibility.
Key Points
- The Privacy Filter is a 1.5B-parameter model with 50M active parameters, released under the Apache 2.0 license.
- The model features a 128,000-token context window and achieves state-of-the-art performance on the PII-Masking-300k benchmark.
- Supported PII categories include
private_person,private_address,private_email,private_phone,private_url,private_date,account_number, andsecret. - Using the
@server.apidecorator for endpoints enables request serialization, proper@spaces.GPUcomposition on ZeroGPU, and simultaneous accessibility via both the Gradio JavaScript client and thegradio_clientPython SDK. - The architecture allows for a hybrid routing approach:
@server.apihandles heavy, queued model computations, while standard FastAPI routes (@server.getand@server.post) manage static HTML, file lookups, and lightweight data retrieval.
Technical Details
The implementation strategy centers on using gradio.Server as a backend to bridge custom-authored frontends with Gradio's infrastructure. For document processing, the model utilizes a single 128k-context forward pass with BIOES decoding to maintain precise span boundaries across long, ambiguous text runs, eliminating the need for complex text chunking or stitching. In image-based workflows, the backend integrates Tesseract OCR to generate character-to-box mappings, which are then transformed into pixel-based rectangles for client-side rendering on a <canvas> element.
The backend logic is designed to separate compute-intensive tasks from static delivery. For example, in a "SmartRedact" implementation, the @server.api endpoint handles the model-driven redaction and ID generation, while standard FastAPI GET routes serve the resulting public and token-gated views. This allows for bespoke URL structures and client-side logic (such as CSS-based filtering or canvas-based image editing) to exist within a single process, significantly reducing the amount of application code required to manage the service.
Impact / Why It Matters
Developers can build highly customized, high-performance user interfaces that retain the scalability and ease of deployment provided by Gradio's backend. This pattern enables the creation of complex, production-ready PII redaction tools that are accessible via both web browsers and automated Python workflows.