Inference Engines
Two ways to run AI models locally
A universal model server for your OS, and a bare-metal boot environment that turns any machine into an AI endpoint. Both built on Foundry, both zero-dependency.
Pantheon
Universal inference engine
A single binary that serves language models, audio models, and vision models. Runs on your existing OS. Built entirely in C with no external dependencies.
- Text, audio, and vision model support
- Multiple quantization formats
- GPU acceleration via Barswap
- OpenAI-compatible API
- Runs on Linux, macOS, Windows
BootAI
Boot directly into AI
A UEFI application that boots any x86_64 machine directly into an AI assistant. No OS installation. Plug in a USB drive, power on, start chatting.
- Zero setup — boots from USB
- Tool use: file I/O, web search, system info
- Network accessible via mDNS
- Manage from any device on the LAN
- Minimal attack surface — no OS, no kernel
At a Glance
Same foundation, different deployment targets.
| Pantheon | BootAI | |
|---|---|---|
| Runs on | Any OS | Bare metal (UEFI) |
| Setup | Download and run | Flash USB and boot |
| Modalities | Text, audio, vision | Text + tool use |
| GPU support | CUDA via Barswap | CPU only |
| Multi-model | Yes | Single model |
| API | OpenAI-compatible | Web UI + mDNS |
| Dependencies | None | None |
The Vision
Most inference solutions require you to install Python, set up CUDA, manage virtual environments, and configure model paths. We think running a language model should be as simple as running any other program.
Pantheon gives you a single binary that serves models on your workstation. BootAI goes further — it makes any spare machine an AI endpoint without touching its hard drive. Together, they cover the full spectrum from development workstation to dedicated inference node.
Built on Foundry
Both engines are powered by Foundry, our pure C ML framework. No Python runtime, no package manager, no cloud account required.