Test your system prompt with a set of 10 predetermined questions that cover your agent’s full scope — including edge cases and boundary tests — before deploying it to any student, and compare the actual responses against what you expected to see.
Testing Is Not Optional
A system prompt you have not tested is a promise you have not verified. You might have written exactly what you intended — but until you see the agent respond to real questions in real situations, you do not actually know whether the instructions are producing the behavior you want. Testing takes 15 to 20 minutes and can save you from a week of fixing problems in a live student environment.
Think of it the way a pilot does a pre-flight checklist before every takeoff — not because they expect the plane to fail, but because the cost of discovering a problem mid-flight is much higher than the cost of discovering it on the ground.
What to Include in Your Test Set
Build a test set of 10 questions in five categories. Two straightforward in-scope questions your agent should handle well — this confirms the basics work. Two knowledge questions that test whether the agent knows your campus accurately — course names, program structure, how students progress. Two tone and style questions — open-ended prompts that let you see whether the voice matches yours. Two boundary questions — things the agent should decline or escalate, like a pricing question or a request for a refund. And two edge cases — unusual requests that fall between categories, where you want to see how the agent handles ambiguity.
Read each response and ask: is this what I would say? Is the length right? Is the tone right? Did it follow the boundary instruction? Did it sound like my campus or like a generic AI? Write down specifically what is wrong with any response that misses, then trace it back to a gap in the prompt and add the instruction that would have prevented the miss.
What This Means for Educators
As a coach or trainer, your campus agent talks to your students when you are not available — at 11pm, on weekends, between cohorts. A well-tested agent extends your support quality into those gaps. An untested one creates support problems you find out about from frustrated students. Testing is the step that determines which experience your students get.
The Simple Rule
Ten questions, five categories, 20 minutes before deployment. Anything that misses tells you exactly what to add to the prompt. Do not deploy until every test response is one you would be proud for a student to receive.
