The 2019 Org Chart Strikes Back: A Copilot Hallucination Story
An anonymised account of what happens when Copilot grounds in a stale tenant — and what the post-mortem revealed about the cleanup we should have done first.
When you roll out Microsoft 365 Copilot, you’re handing every employee a large language model that grounds its responses in the documents, emails, chats, and meeting notes already sitting in your tenant.
That’s the value proposition — Copilot doesn’t make things up out of nothing, it answers from your data.
The trouble is, your data isn’t a clean source of truth. It’s an archaeological site.
Most tenants we scan have content from three different eras of the business: the founding-team docs from 2017 that nobody has touched in five years, the “transformation initiative” content from 2021 that everyone still references even though it was deprecated, and the current quarter’s working files.
Copilot doesn’t know which era to trust. It looks at all of them, weighs by recency and relevance scores, and gives you an answer that’s plausibly drawn from any of them — sometimes blending facts that contradict each other.
That’s a hallucination. Not in the LLM sense of inventing facts from thin air, but in the more dangerous corporate sense: confidently surfacing content that’s wrong because it’s old, and presenting it with the same authority as content that’s right because it’s current.
Stale policies. The 2019 expense policy capped lunch reimbursements at $25; the 2024 update raised it to $40. If both PDFs are in the same SharePoint site and the 2019 one has a more “official-sounding” filename, Copilot may quote $25 to a new hire who then submits a $30 lunch and gets denied.
Duplicate-with-drift. Sales has version 1.0 of the price sheet in their site. Marketing has version 1.0 in theirs. Six months later, Sales updated theirs to 1.1 but Marketing didn’t. A prospect asks Copilot for current pricing through their account manager, who’s in Marketing. They get 1.0 prices. The deal closes at the wrong number.
Orphaned answers. The 2018 acquisition’s product wiki is still in the tenant — nobody owns it, nobody maintains it. A support agent asks Copilot about a feature, Copilot finds the answer in the orphaned wiki, the feature was deprecated in 2020. Customer is told it still exists.
In every case, the LLM is doing its job. The data is doing the lying.
Before Copilot, stale content was passively bad — it sat there until someone happened to read it.
Most of the time nobody did. Copilot changes the consumption model: every question pulls in candidate content, ranks it, and serves it.
Stale content goes from passively harmful to actively retrieved. The footprint of “files nobody looks at anymore” shrinks to zero — Copilot looks at everything.
Microsoft’s response to this problem is “use sensitivity labels and information protection.”
That’s correct but slow — it’s a multi-year program and most organisations don’t finish it.
There’s a faster intervention available: identify the content that’s most likely to mislead Copilot (old, duplicate, unowned) and either archive, delete, or update it.
A scan that produces this inventory takes under an hour for most tenants. The cleanup itself can be staged over a quarter — archive stale sites, dedupe price sheets, assign owners to orphans.
The result isn’t perfect Copilot accuracy, but it removes the most common categories of corporate hallucination.
A storage scan and a Copilot-readiness scan are essentially the same scan. Storage savings are the financial argument; Copilot accuracy is the operational one. Most tenants we work with start the project for the financial reason and discover the operational benefit was worth more.
An anonymised account of what happens when Copilot grounds in a stale tenant — and what the post-mortem revealed about the cleanup we should have done first.
Microsoft 365 Copilot grounds every answer in your tenant’s own content. When that content is stale, contradictory, or duplicated, Copilot doesn’t shrug — it answers with confidence. Here’s why that’s a problem and what to do about it.
Most Copilot governance programs focus on access controls, sensitivity labels, and prompt filters. They overlook the most consequential variable: the quality of the underlying content. Here’s why content hygiene is the foundation, and what to do this quarter.
opilot grounds its answers in your tenant’s content — and the content is full of contradictions, duplicates, and zombie data. Here are the five patterns we see in every scan that turn Copilot into a confident misinformer.
That extra TB Microsoft is about to sell you is probably already in your tenant. You just can’t see it.
A 143-user enterprise tenant cut $18,000 off its annual Microsoft bill in 90 days — without touching live content. Here’s the playbook.