Prompt A/B testing and evaluation

Day 22 of 30 · 30 Days of AI

Compare prompts and pick the best performer


Learning goal

  • Create two prompt variants (A/B).
  • Define 3–4 evaluation criteria.
  • Select the better prompt with evidence.

Why it matters

  • Small changes in prompts can change quality.
  • Criteria-driven eval reduces guesswork.

Explanation

  • Make variants: change format, constraints, role.
  • Criteria: accuracy, clarity, brevity, actionability.
  • Ask the model to self-score then you verify.

Examples

  • Prompt: “Evaluate A vs B on accuracy/clarity/brevity/actionability; score 1–5; explain.”
  • Weak: “Which is better?”

  • Guided exercise (10–15 min)

    1. Pick a task; write prompt A and B.
    2. Generate outputs; score with criteria; pick winner.

    Independent exercise (5–10 min)

    Tweak the weaker prompt and retest.


    Self-check

    • A/B outputs generated.
    • Scores per criterion.
    • Winner chosen with reason.

    Optional deepening

Day 22: Prompt A/B testing and evaluation | 30 Days of AI | Amanoba