My AI Agents Started Gossiping About Each Other

“Prediction tier: A.”

That sentence was in my calendar summary. A weekly calendar report — meetings, deadlines, reminders. And right there in paragraph two, casual as anything: “prediction tier: A.”

My calendar agent has no idea what a prediction tier is. Those are words from a completely different system. They don’t belong in a calendar report any more than a recipe belongs in a tax return.

It took me thirty seconds to figure out how they got there. It took me a month to find all the damage.

The clever idea

I had multiple agents running on a schedule. Calendar agent. Product scorer. Content writer. Accountability tracker. Each one ran in sequence, one after the other.

Starting a fresh conversation for each one costs time and tokens. So I did the “efficient” thing — I let them share the same conversation thread. Run the calendar agent, then the scorer, then the writer, all in one window.

It was like hiring four employees and making them all work at the same desk. Faster setup. Less overhead. And eventually, their papers get mixed up.

The shared desk: two people at one desk, Calendar and Scores papers interleaving.

The archaeology

Once I spotted the calendar contamination, I started digging through old outputs. Scrolling through weeks of saved reports on a Saturday afternoon, highlighting every line that didn’t belong. “How did I miss this?” kept running through my head.

The content writer had been referencing meeting times. The accountability scorer was using vocabulary from content drafts. The product scorer’s language was showing up in calendar reports. Every agent was leaking into every other agent.

And every single output looked professional. Good grammar. Confident tone. Clean formatting. You had to read carefully — really carefully — to notice that a sentence didn’t belong.

A calendar report with 'prediction tier: A' highlighted — content from the wrong agent leaked in.

My automated checks missed all of it. “Did the agent produce output?” Yes. “Is it properly formatted?” Yes. “Does it contain the right type of information?” Technically… yes. The wrong facts were dressed in the right format.

I’d been sending myself these reports for a full month. A month of contaminated outputs that I trusted because they looked clean.

The thought that kept me up

If I’d been running this for customers instead of myself, this wouldn’t be contamination. It would be a data breach.

Customer A’s private data showing up in Customer B’s report. Not because of a hack — because two agents shared a conversation window and one picked up the other’s leftovers. A data breach wearing a nice suit.

The system wouldn’t have flagged it. The output would have looked perfect. The customer might not have even noticed — unless they read carefully enough to spot a phrase that didn’t belong.

Data breach: Customer A's report with Customer B's data circled in red.

The fix

Every agent gets its own fresh conversation. No sharing. No exceptions. Each one starts with a clean system prompt and empty message history.

Before: agents share one conversation, context bleeds everywhere. After: each agent gets its own clean context.

My API costs went up about 15%. Every fresh conversation has startup overhead, and with dozens of agents running daily, that adds up.

I don’t care. The moment I switched, every output was clean. Calendar reports had only calendar data. Scoring reports had only scoring data. Content was about content. Nothing leaked. Nothing crossed.

Cost of correctness: small bar for +15% API cost, big bar for 100% correct outputs.

The reason I shared conversations in the first place wasn’t laziness. It was intuition.

Humans share context all the time. In a meeting, you don’t wipe your memory between agenda items. You carry forward what was said five minutes ago. That’s how human brains work — context accumulates and helps.

AI looks like it works the same way. It doesn’t. When an AI carries forward context from a previous task, it doesn’t “understand” the boundary between tasks.

It treats everything in the window as relevant. It can’t tell the difference between “this is from the last agent’s work, ignore it” and “this is useful background information.”

We assume AI thinks like us because it talks like us. It doesn’t.

And the gap between how it sounds and how it works is exactly where this bug lives.

Building multi-agent systems? I’ve already stepped on most of the landmines — mo@fadaly.net.

The clever idea

The archaeology

The thought that kept me up

The fix

Why we share anyway