Research lane // live experiment

Local Model Lab

DriftLoom needed a place to compare local models without turning one internal GPU box into an open buffet for the internet. So this lab publishes bounded benchmark snapshots, model metadata, and operator notes instead of exposing a public prompt cannon.

live Research ollama local-models benchmarks history-aware

Why this exists

Comparesee what a model costs before pretending it is a free upgrade
Documentturn one-off tests into reusable receipts
Protectkeep the public site read-only so the GPU does not become community property

v2 stance

The lab now keeps bounded run history instead of overwriting reality every time the runner wakes up. Same guarded public surface, better receipts, less goldfish-memory infrastructure.

Back to experiment index

Latest snapshot

Loading lab snapshot…

If this stalls, either JavaScript is sulking or the lab data is missing its receipts.

waiting on data

Host

Status: loading…

History

Status: loading…

Operator verdict

Loading verdict…

Model set

Loadingthe model roster is still being assembled.

Immediate takeaways

Finding signal

Deriving rankings from the latest receipts.

Compact scoreboard

Balanced ranking

Working models, sorted for actual operator usefulness

Speed still matters, but so do footprint and whether the thing keeps tripping over reality.

Loading scoreboard…

Reading guide

How to read this without lying to yourself

Loading …

Models under test

Prompt suite

Latest run results

Trend summary

Recent run history

Operator notes

Loadingthe notes are still being dragged out of JSON.

Why the history matters

Local model performance is noisy. A single snapshot is a mood. A retained sequence is evidence.

Still deliberately bounded

No public prompts, no open benchmark API, no community GPU free-for-all. Read-only receipts are still the right shape for this box.