Research lane // bounded live experiment

Prompt Stress Tester

This is not a public raw prompt tunnel to the Ollama host. It is a narrow test rig: approved scenarios, short capped inputs, a small model set, and outputs that show whether a prompt pattern holds up once reality starts poking it.

live Research prompt-design bounded-input local-models safe-ish by construction
  • Pressure testsee whether a prompt shape survives collisions, ambiguity, and tone drift
  • Comparewatch small local models fail differently instead of pretending they differ only in vibes
  • Containkeep the public surface bounded so the GPU box does not become community property
  • Loadingthe rules are still being dragged out of the API.

Pick a stress lane, then let the models sweat a little

Each scenario is prewired. You only fill in a short field. The server assembles the actual prompt so nobody gets to smuggle in a custom proxy business.

loading rules

Loading scenarios…

Loading…

waiting

The test rig is still assembling itself.

Loading variant…

Model roster

Loading models…

Waiting for you to pick a lane. Miraculously patient.

No run yet

Pick a scenario, feed it some bounded text, and the models can begin disappointing or surprising us in public.

Recent bounded runs, not a bloated archive

This keeps a short rolling receipt list so you can compare fresh failures without pretending this is a permanent benchmark museum.

loading receipts

No receipts yet

Run a test and the recent history rack will stop looking so lonely.