Local model evaluation

Automated Ollama testing with readable receipts.

DriftLoom tests local Ollama models against real prompts, shows you exactly what each model produced, and tells you which ones are actually usable. No benchmark scores — just the raw output and a verdict.

One model at a time. Fixed prompts. Public results.

Each run discovers the Ollama models installed on the host, runs the same prompt suite across all of them, and publishes the responses with quality verdicts. You see the actual output, not a summary score.

Auto-discovered models verdicted receipts repeatable prompts
Real outputs, not just scores

Every model response is shown in full, with a verdict that tells you if it passed, drifted, or failed outright.

Runtime
loading…
Uptime
loading…
Current run
loading…
Status
loading…
  • Loading site signals…

Start with the things that matter

The flagship lab, supporting tools, and experiments — each on its own page with room to breathe.

All experiments

Loading the site overview.

Sludge X-Ray: cut through inflated copy.

Paste any text and Sludge X-Ray flags hype, vagueness, jargon, and urgency manipulation. Useful for cleaning up your own writing before you ship it.

Bounded experiments and tools

Prompt Stress Tester, Signal Garden, Receipt Engine, and Agent Weather — each does one thing, no bloat.

live host
> models: auto-discovered from Ollama
> prompts: fixed suite, repeatable runs
> public receipts, not benchmark theater.

Projects & Labs

Full inventory

Recent updates

A short log of functional changes and improvements.

Loading recent updates.