June 23, 2026

MDST Engine 2
Fastest GGUF on the web

Built from scratch in Rust, optimized for each quantization,
Engine 2 runs GGUF models in any browser, at full speed

Run Benchmarks

4 min read

TLDR: MDST Engine 2 runs full GGUF models in your browser at full speed. It is a from-scratch Rust inference engine on WebGPU, private and local, on your own GPU. Runs on Chrome, Safari, and Edge.

An engine designed for the browser

MDST Engine 2 runs full-size GGUF models on WebGPU, right inside the browser. It is written from scratch in Rust, with no runtime overhead and nothing between your model and the GPU. The result is desktop-class inference in a tab, today, on Chrome, Safari, and Edge.

It runs what you already use. Gemma, Qwen, Llama, and any single-file GGUF from Hugging Face, across quantizations from Q4 to Q8. Choose a small model for speed or a larger one for depth, and it streams on a laptop five years old or newer.

Every model you load runs and streams on the same engine that already powers local AI inside MDST. What you measure is what ships.

Inference built to stay ahead

Local inference gets faster every month, and Engine 2 is built to ride that curve. Today's speed is only the start.

New open-weight models land in the engine the day they ship, so the latest is always one click away. Every kernel keeps getting tuned. And each browser that improves WebGPU hands you more tokens per second, for free. As GPUs, quantizations, and browsers mature, the engine matures with them, and the lead compounds in your favor.

MDST Engine 2

Top picks

Bonsai 1.7B

Q1_0 · 237 MB

Gemma 3 270M IT

Q4_K_M · 248 MB

Qwen3.5 0.8B

Q5_K_M · 563 MB

Qwen3.5 2B

Q4_K_M · 1.3 GB

0.0

tok/s

Bonsai 1.7BGemma 3 270M ITQwen3.5 0.8BQwen3.5 2B

--prefill tok/s

--Time to token

--total time

Run benchmark

Run Engine 2 on your own site

Engine 2 already powers every local model inside MDST, the agentic, collaborative IDE. MDST Platform brings the same engine to your own site, so your users run full GGUF models on their own GPU, fully private, with nothing sent to a server.

It scales the way the web does. Every model, quantization, and browser gets its own optimized bundle, generated ahead of time and served from a CDN at around 100KB. No servers to provision, no inference bill to grow, ready for one user or a million.

¹ Performance and size claims measured by MDST in July 2026 against the fastest WebGPU GGUF runtimes available at the time, on Qwen 3.5 2B. Prefill and decode throughput were measured separately. The 20x figure compares the per-model Engine 2 bundle against general-purpose browser runtimes. Results vary by device, browser, GPU, and model.

MDST Engine 2Fastest GGUF on the web

An engine designed for the browser

Inference built to stay ahead

Top picks

Run Engine 2 on your own site

MDST Engine 2
Fastest GGUF on the web