Prompt A/B testing and evaluation
Day 22 of 30 · 30 Days of AI
Compare prompts and pick the best performer
Learning goal
- Create two prompt variants (A/B).
- Define 3–4 evaluation criteria.
- Select the better prompt with evidence.
Why it matters
- Small changes in prompts can change quality.
- Criteria-driven eval reduces guesswork.
Explanation
- Make variants: change format, constraints, role.
- Criteria: accuracy, clarity, brevity, actionability.
- Ask the model to self-score then you verify.
Examples
- Prompt: “Evaluate A vs B on accuracy/clarity/brevity/actionability; score 1–5; explain.”
- Weak: “Which is better?”
- Pick a task; write prompt A and B.
- Generate outputs; score with criteria; pick winner.
- A/B outputs generated.
- Scores per criterion.
- Winner chosen with reason.
- Evaluation prompting: prompt guide
Guided exercise (10–15 min)
Independent exercise (5–10 min)
Tweak the weaker prompt and retest.