Add an explicit “admit uncertainty” instruction to your system prompt that tells the agent to say it doesn’t know when it lacks confident information, and give it the exact phrase to use so the behavior is consistent rather than improvised.
Why Agents Make Things Up by Default
AI models are trained to complete responses helpfully. When they don’t have reliable information, the default behavior — without specific instructions — is to generate something plausible-sounding rather than leave the question unanswered. This is called hallucination, and it’s not dishonesty, it’s the model doing what it was designed to do: complete the task. The problem is that “plausible” and “accurate” are not the same thing, and your students will assume they are.
Think of it like asking a new team member a question they’re not sure about. Without coaching, many people will guess confidently rather than admit uncertainty. You have to explicitly teach them that saying “I don’t know, let me find out” is the right answer — and the same is true for your AI agent.
How to Write the Instruction
In your system prompt, include a clear rule like: “If you are not confident about an answer, do not guess. Say: ‘I’m not sure about that — I’d recommend checking with [your name] or visiting [resource URL] for accurate information.'” The key is specificity. Telling the agent to “be honest” is not enough — it will still generate a response it thinks sounds reasonable. Telling the agent the exact phrase to use when uncertain gives it a reliable exit ramp that it will actually take.
You can also use the phrase “only answer questions you are confident about” combined with “for anything else, say you’re not sure and direct the student to a specific resource.” Claude and ChatGPT both respond well to this pattern when the instruction is clear.
What This Means for Educators
A student who gets a wrong answer from your AI agent and acts on it — submitting an assignment wrong, missing a deadline, following bad advice — will blame you, not the AI. The “I don’t know” instruction protects your students and your reputation at the same time. It’s one of the most important guardrails to put in any campus-facing agent, especially one that handles course content, schedules, or technical questions.
The Simple Rule
Don’t just tell the agent to be honest. Give it the exact sentence to say when it doesn’t know, and a specific place to point the student. Concrete language produces consistent behavior. Once this is in your prompt, your agent becomes trustworthy rather than confidently wrong.
