write-judge-prompt
๐ฏSkillfrom hamelsmu/evals-skills
A skill from the Eval Skills collection that guides AI coding agents in building LLM evaluations, covering eval auditing, error analysis, judge prompt writing, RAG evaluation, and more. Guards against common mistakes observed across 50+ companies.
Same repository
hamelsmu/evals-skills(7 items)
Installation
npx vibeindex add hamelsmu/evals-skills --skill write-judge-promptnpx skills add hamelsmu/evals-skills --skill write-judge-prompt~/.claude/skills/write-judge-prompt/SKILL.mdSKILL.md
More from this repository6
Audits an LLM evaluation pipeline and surfaces problems with prioritized severity, guarding against common mistakes seen across 50+ companies, and recommends other skills to fix the issues found.
A skill from the Eval Skills plugin that helps AI coding agents build LLM evaluations, including eval auditing, error analysis, RAG evaluation, and judge prompt writing. Based on lessons learned from helping 50+ companies with their eval pipelines.
Guides AI coding agents through reading LLM traces and systematically categorizing failures, part of the Eval Skills collection that helps build robust LLM evaluations based on lessons from 50+ companies.
Creates diverse synthetic test inputs using dimension-based tuple generation for LLM evaluations, part of the Eval Skills collection that guards against common mistakes in eval pipeline construction.
A collection of skills that guide AI coding agents to build LLM evaluations, including an eval audit skill that catches common mistakes, error analysis for reading traces and categorizing failures, and tools for validating evaluator quality.
A skill for building custom browser-based annotation interfaces to review LLM traces and collect structured human feedback, with pass/fail labeling, keyboard shortcuts, domain-appropriate data formatting, and Playwright verification testing.