Skip to main content

Writing Effective Line Item Prompts

Written by Alex Richards

How do I write effective prompts for GenAI line items?

Required Feature Flags

The following feature flags and permissions are required to use this feature:

Feature Flag

Technical Name

Description

SmartScore v2

feature_smartscore_v2

Enables building, testing and managing custom automated line items

Required Permissions:

  • Manage Smartscore (evaluagent-cx.smartscore-v2.manage) — to build, test and manage custom Automated Line Items

Overview

A GenAI line item asks the AI a single yes-or-no style question about a conversation and returns a verdict (for example Pass, Fail or N/A). The quality of that verdict depends almost entirely on how clearly you write the prompt and the outcome explanations that go with it.

This guide gives you a reliable shape to follow. It's built from real tuning work on live line items, and it applies whatever the conversation is about. Get the structure right and your scores stay consistent across thousands of contacts.

For where to enter all of this, see How do I build a GenAI-based Line Item. You write the prompt in the Prompt field, and the Pass, Fail and N/A explanations on the scoring template.

The one rule that matters most: keep everything consistent

Your Prompt, your Pass explanation, your Fail explanation and your N/A explanation must all describe the same thing.

If the prompt asks "did the agent confirm the customer's identity?" but the Pass explanation talks about "the agent greeting the customer politely", the AI has two different criteria to choose from and will pick a different one on different runs. That's the single biggest cause of scores that flip between runs.

Before you save a line item, read all four boxes back to back and check they describe one criterion in plain words. If they don't agree, fix them before anything else.

Seven habits of a reliable prompt

  • Lead with the question. Put the thing you're scoring in the very first line, so it can't get lost. For example: "Did the agent ask the customer to confirm their quote details are correct?"

  • State the allowed verdicts up front. Even though your scoring template defines them, listing them as a closed set near the top keeps the AI inside the lines. For example: ALLOWED VERDICTS: N/A Fail.

Pass

  • Use strong section dividers. Break the prompt into a few clear sections with headings like === PHASE 1: N/A CHECK ===. Bold, obvious dividers work far better than a dash or a plain heading, which the AI can read straight past.

  • Tell the AI what to do, not what to avoid. Write "Mark Pass when..." and "Classify the call as..." rather than "Don't..." or "Never...". The AI handles positive instructions far more reliably. If you genuinely need a "this does not count" line, keep it to one line.

  • Keep it to two or three short steps. A simple flow such as "first check for N/A, then check whether the agent did the thing, then decide" helps. Avoid lookup tables of rules — the AI reads them but doesn't follow them reliably.

  • Add one to three short examples. Concrete "this input gives this verdict" examples teach the AI the shape of a good answer. Write your own representative examples — don't paste the exact wording from real calls you tested against, or the AI just memorises those calls and struggles with new ones.

  • Keep it short. Aim for a prompt you can read on a single screen — a few short sections, not pages. Every time you add more text to fix one awkward call, you tend to introduce a new problem on a different call. Stay close to the smallest version that works.

A clean prompt to start from

Use this as a skeleton and fill in the brackets:

Writing the outcome explanations

Each explanation should restate the same criterion briefly, in its own words:

  • Pass — The call is in scope and the agent met the criterion. List the two or three forms that qualify if it helps. One clear instance anywhere on the call is enough.

  • Fail — The call is in scope and the agent did not meet the criterion. List the common mix-ups that look similar but don't count, so a call with only those is a Fail.

  • N/A — The call's purpose means this question doesn't apply. List those purposes.

Before you save: a quick checklist

  • The prompt, Pass, Fail and N/A explanations all describe the same criterion.

  • The question you're scoring is in the first line of the prompt.

  • The allowed verdicts are listed as a closed set near the top.

  • Sections use strong dividers like === HEADING ===.

  • Your examples are ones you've written, not copied word for word from real calls.

  • Instructions are positive ("Mark Pass when...") wherever possible.

  • The whole prompt fits on one screen.

Test before you go live

Always run your line item through the Testing Console on a spread of real conversations before adding it to a live scorecard. If a verdict looks wrong, check the four boxes agree on one criterion first — that fixes most problems. See How do I test the accuracy of an Insight Topic? for how testing works.

Related guides

Did this answer your question?