Inference Engines

Two ways to run AI models locally

A universal model server for your OS, and a bare-metal boot environment that turns any machine into an AI endpoint. Both built on Foundry, both zero-dependency.

Model Server

Pantheon

Universal inference engine

A single binary that serves language models, audio models, and vision models. Runs on your existing OS. Built entirely in C with no external dependencies.

Text, audio, and vision model support
Multiple quantization formats
GPU acceleration via Barswap
OpenAI-compatible API
Runs on Linux, macOS, Windows

Bare Metal

BootAI

Boot directly into AI

A UEFI application that boots any x86_64 machine directly into an AI assistant. No OS installation. Plug in a USB drive, power on, start chatting.

Zero setup — boots from USB
Tool use: file I/O, web search, system info
Network accessible via mDNS
Manage from any device on the LAN
Minimal attack surface — no OS, no kernel

At a Glance

Same foundation, different deployment targets.

	Pantheon	BootAI
Runs on	Any OS	Bare metal (UEFI)
Setup	Download and run	Flash USB and boot
Modalities	Text, audio, vision	Text + tool use
GPU support	CUDA via Barswap	CPU only
Multi-model	Yes	Single model
API	OpenAI-compatible	Web UI + mDNS
Dependencies	None	None

The Vision

Most inference solutions require you to install Python, set up CUDA, manage virtual environments, and configure model paths. We think running a language model should be as simple as running any other program.

Pantheon gives you a single binary that serves models on your workstation. BootAI goes further — it makes any spare machine an AI endpoint without touching its hard drive. Together, they cover the full spectrum from development workstation to dedicated inference node.

Built on Foundry

Both engines are powered by Foundry, our pure C ML framework. No Python runtime, no package manager, no cloud account required.

About Foundry All Projects Support Development