Inference Engines

Two ways to run AI models locally

A universal model server for your OS, and a bare-metal boot environment that turns any machine into an AI endpoint. Both built on Foundry, both zero-dependency.

Model Server

Pantheon

Universal inference engine

A single binary that serves language models, audio models, and vision models. Runs on your existing OS. Built entirely in C with no external dependencies.

  • Text, audio, and vision model support
  • Multiple quantization formats
  • GPU acceleration via Barswap
  • OpenAI-compatible API
  • Runs on Linux, macOS, Windows
Bare Metal

BootAI

Boot directly into AI

A UEFI application that boots any x86_64 machine directly into an AI assistant. No OS installation. Plug in a USB drive, power on, start chatting.

  • Zero setup — boots from USB
  • Tool use: file I/O, web search, system info
  • Network accessible via mDNS
  • Manage from any device on the LAN
  • Minimal attack surface — no OS, no kernel

At a Glance

Same foundation, different deployment targets.

Pantheon BootAI
Runs on Any OS Bare metal (UEFI)
Setup Download and run Flash USB and boot
Modalities Text, audio, vision Text + tool use
GPU support CUDA via Barswap CPU only
Multi-model Yes Single model
API OpenAI-compatible Web UI + mDNS
Dependencies None None

The Vision

Most inference solutions require you to install Python, set up CUDA, manage virtual environments, and configure model paths. We think running a language model should be as simple as running any other program.

Pantheon gives you a single binary that serves models on your workstation. BootAI goes further — it makes any spare machine an AI endpoint without touching its hard drive. Together, they cover the full spectrum from development workstation to dedicated inference node.

Built on Foundry

Both engines are powered by Foundry, our pure C ML framework. No Python runtime, no package manager, no cloud account required.