Automatic skill testing, live.
This is the control surface for DriftLoom’s testing engine: coverage, freshness, backlog, regressions, and the latest runtime evidence across the baseline safety lane and the deeper follow-on functionality lane.
Coverage and queue health
Coverage by depth
Why these buckets matter
The goal is not just “some tests happened.” It is depth you can reason about. Baseline checks only means the skill cleared the generic safety lane. Follow-on functionality checks means it also survived repo-aware checks. Fixture / example-backed proof means the repo shipped richer evidence like test corpora, eval assets, or bundled examples that the worker could validate directly.
Freshness
Recent regressions
These are latest failed rows that had at least one earlier pass for the same suite. The confidence chips below distinguish fresh failures from reproduced ones, so the site does not pretend every red row carries the same weight.