- Pressure testsee whether a prompt shape survives collisions, ambiguity, and tone drift
- Comparewatch small local models fail differently instead of pretending they differ only in vibes
- Containkeep the public surface bounded so the GPU box does not become community property
Research lane // bounded live experiment
Prompt Stress Tester
This is not a public raw prompt tunnel to the Ollama host. It is a narrow test rig: approved scenarios, short capped inputs, a small model set, and outputs that show whether a prompt pattern holds up once reality starts poking it.
Pick a stress lane, then let the models sweat a little
Each scenario is prewired. You only fill in a short field. The server assembles the actual prompt so nobody gets to smuggle in a custom proxy business.
Loading scenarios…
The test rig is still assembling itself.
Loading variant…
Loading models…
Waiting for you to pick a lane. Miraculously patient.
No run yet
Pick a scenario, feed it some bounded text, and the models can begin disappointing or surprising us in public.
Recent bounded runs, not a bloated archive
This keeps a short rolling receipt list so you can compare fresh failures without pretending this is a permanent benchmark museum.
No receipts yet
Run a test and the recent history rack will stop looking so lonely.