June 23, 2026

MDST Engine 2
Fastest GGUF on the web

Built from scratch in Rust, optimized for each quantization,
Engine 2 runs GGUF models in any browser, at full speed
Run Benchmarks
4 min read
Share
TLDR: MDST Engine 2 runs full GGUF models in your browser at full speed. It is a from-scratch Rust inference engine on WebGPU, private and local, on your own GPU. Runs on Chrome, Safari, and Edge.

An engine designed for the browser

MDST Engine 2 runs full-size GGUF models on WebGPU, right inside the browser. It is written from scratch in Rust, with no runtime overhead and nothing between your model and the GPU. The result is desktop-class inference in a tab, today, on Chrome, Safari, and Edge.
It runs what you already use. Gemma, Qwen, Llama, and any single-file GGUF from Hugging Face, across quantizations from Q4 to Q8. Choose a small model for speed or a larger one for depth, and it streams on a laptop five years old or newer.
Every model you load runs and streams on the same engine that already powers local AI inside MDST. What you measure is what ships.

Sustained tokens per second as models get smaller, measured on the same hardware

Inference built to stay ahead

Local inference gets faster every month, and Engine 2 is built to ride that curve. Today's speed is only the start.
New open-weight models land in the engine the day they ship, so the latest is always one click away. Every kernel keeps getting tuned. And each browser that improves WebGPU hands you more tokens per second, for free. As GPUs, quantizations, and browsers mature, the engine matures with them, and the lead compounds in your favor.
MDST Engine 2

Top picks

0.0
tok/s
Bonsai 1.7BGemma 3 270M ITQwen3.5 0.8BQwen3.5 2B
--prefill tok/s
--Time to token
--total time
Run benchmark
--

Run Engine 2 on your own site

Engine 2 already powers every local model inside MDST, the agentic, collaborative IDE. MDST Platform brings the same engine to your own site, so your users run full GGUF models on their own GPU, fully private, with nothing sent to a server.
It scales the way the web does. Every model, quantization, and browser gets its own optimized bundle, generated ahead of time and served from a CDN at around 100KB. No servers to provision, no inference bill to grow, ready for one user or a million.

1 Performance and size claims measured by MDST in July 2026 against the fastest WebGPU GGUF runtimes available at the time, on Qwen 3.5 2B. Prefill and decode throughput were measured separately. The 20x figure compares the per-model Engine 2 bundle against general-purpose browser runtimes. Results vary by device, browser, GPU, and model.