Why I Never Ship AI Output Without a Second Opinion

Last March, my content agent wrote a market report that cited a Gartner study. Specific numbers. Publication year. Percentage growth. It read like something a senior analyst would produce.

The study didn’t exist.

Not “slightly wrong” or “misremembered.” Completely fabricated. The AI had invented a source, assigned it plausible-sounding numbers, and presented it with the confidence of someone reading from a footnote.

I almost shipped it. The only reason I caught it was that I Googled the study to pull the original — typed in the exact title, hit enter, and stared at a page of zero results. That cold feeling in your stomach when you realize you nearly sent a fabricated source to a client.

That was the last time I let AI output leave my system unchecked.

The newsroom

The fix was simple enough that I felt stupid for not thinking of it earlier.

One AI writes the content. A completely different AI plays editor. The editor’s only job: take every claim in the output, compare it against the original source material, and ask one question — “Can I find this in the source?”

If yes, the claim stays. If not, it gets flagged and sent back.

The newsroom pattern: Writer AI creates content, Editor AI checks facts, only clean output ships. Rejects loop back.

That’s it. Writer creates. Editor checks. Nothing ships until the editor says it’s clean.

Why “be careful” doesn’t work

I tried the obvious approach first. I added instructions to the writer: “Only include verified facts.” “Do not make claims unsupported by the source.” “If you’re unsure, say so.”

It helped a little. It didn’t solve the problem.

The AI doesn’t know it’s fabricating. It’s not lying. It’s generating the most probable next word based on patterns. Sometimes the most probable next word is wrong.

The model can’t tell the difference between a real memory and a convincing hallucination — the same way you can’t always tell the difference between something you read and something you dreamed you read.

Telling an AI “don’t hallucinate” is like telling yourself “don’t forget things.” Your brain agrees it’s a great idea and then forgets anyway.

Why a second AI catches what the first one misses

Same reason a writer shouldn’t proofread their own book.

When the same AI generates content and then reviews it, it still has all the same assumptions. It reads what it meant to write. Of course it confirms its own claims — it just made them up five seconds ago.

A separate AI starts fresh. No attachment to the claims. No memory of generating them. It sees the content for the first time and its only job is to be a skeptic.

Same AI reviewing itself confirms its own assumptions. A separate AI starts fresh with no attachment to the claims.

It’s the same reason code reviews exist. Not because programmers are careless. Because everyone is blind to their own assumptions.

The numbers

Over 200+ verified outputs in the past six months, the verifier catches roughly 90-95% of factual errors. It’s not perfect — some hallucinations are sophisticated enough to fool a second check. The remaining 5-10% tend to be subtle enough to fool a human reviewer too.

But 90% is dramatically better than 0%. And the cost is roughly double the API cost of the content generation itself. For anything that someone else will read — which should be everything you ship — that’s the cheapest insurance you’ll find.

The verifier prompt is almost embarrassingly simple:

Compare each claim against the source material.
Flag any claim not supported by the source.
If a statistic is cited, verify the exact number.

Error catch rate: 0% without a verifier vs 90-95% with one.

The Gartner study that never was

I keep thinking about that fake study.

If I’d shipped that report, someone would have searched for the source. Found nothing. And then they wouldn’t just distrust that one fact — they’d distrust everything I’d ever sent them. One fabricated citation erases months of credibility.

The fabricated source: a professional paragraph citing 'Gartner, 2025' — crossed out with DOESN'T EXIST.

The verification step takes a few extra seconds per piece of content and costs an extra cent or two. The alternative is explaining to a client why your report cited a study that doesn’t exist.

Cost of skipping: $0.02 verification cost vs months of lost trust from one fabricated citation.

I know which one is cheaper.

Why we skip the check

We skip it because the output looks right. Polished paragraphs. Confident tone. Specific numbers. Our brains trust specific numbers — “grew 23% year-over-year” sounds more real than “grew significantly,” even when both are equally made up.

AI has learned what believable looks like. It writes things that feel true in every way except the one that matters: actually being true.

The second opinion isn’t about the AI’s capability. It’s about ours. We’re not good enough at catching lies that sound confident to skip the check. Neither am I. That’s why the machine does it for me.

Using AI to create content? I’d love to hear how you handle accuracy — mo@fadaly.net.