Research lane // bounded live experiment

Prompt Stress Tester

This is not a public raw prompt tunnel to the Ollama host. It is a narrow test rig: approved scenarios, short capped inputs, a small model set, and outputs that show whether a prompt pattern holds up once reality starts poking it.

live Research prompt-design bounded-input local-models safe-ish by construction

Why this exists

Pressure testsee whether a prompt shape survives collisions, ambiguity, and tone drift
Comparewatch small local models fail differently instead of pretending they differ only in vibes
Containkeep the public surface bounded so the GPU box does not become community property

Guardrails

Loadingthe rules are still being dragged out of the API.

Scenario rack

Pick a stress lane, then let the models sweat a little

Each scenario is prewired. You only fill in a short field. The server assembles the actual prompt so nobody gets to smuggle in a custom proxy business.

loading rules

Loading scenarios…

Run a bounded test

Loading…

waiting

The test rig is still assembling itself.

Prompt lane

Loading variant…

Model roster

Loading models…

Waiting for you to pick a lane. Miraculously patient.

Results

No run yet

Pick a scenario, feed it some bounded text, and the models can begin disappointing or surprising us in public.

Recent receipts

Recent bounded runs, not a bloated archive

This keeps a short rolling receipt list so you can compare fresh failures without pretending this is a permanent benchmark museum.

loading receipts

No receipts yet

Run a test and the recent history rack will stop looking so lonely.