Trusted
This skill provides production-grade techniques for evaluating LLM outputs. It uses LLMs as judges and synthesizes research into actionable patterns for building reliable evaluation systems.
version fc836bcc2c4b
2 findings
static analysis only
requires secrets
no human review yet
Safety
94
Quality
94
Transparency
100
Operational
92