The AI pilot failed. Or it ran for a few weeks, then started producing outputs that were embarrassingly wrong. Or leadership got a demo, asked a few real questions, and the answers made your data problems visible in a way no explanation could fix. If any of that sounds familiar, you are not alone. The question worth sitting with before the next initiative gets funded: what needs to be true about your data before AI can do anything useful with it?
Here is what needs to be in place.
What data governance for AI deployment actually means
Data governance for AI deployment is the set of policies, ownership decisions, and structural practices that determine whether your organization's data is reliable, bounded, and accountable enough for AI tools to produce results worth acting on.
That definition is narrower than data governance in the enterprise sense. You do not need a data lake, a formal data stewardship program, or a catalog platform with dedicated staff. What you do need is a clear answer to three questions about every data source an AI tool will touch: Who owns it? Is it current? And who is allowed to see it?
Those questions are more specific than they sound. For a company without a dedicated data team, answering them requires looking at your actual systems, not an idealized picture of them. Your CRM data probably has missing fields that no one ever enforced. Your shared drives have files that have not been opened in three years. Your HR records in your payroll system may not match what is in your directory. None of this is unusual. It becomes a problem the moment an AI tool starts drawing on these sources to generate answers that people trust.
Seven Roots' AI readiness assessment is a practical starting point for mapping where your data actually stands before a deployment.
Bad data gets worse when AI starts using it
The assumption that derails most mid-market AI deployments is that the tool will compensate for data quality gaps. It will not. An AI assistant that surfaces outdated records surfaces them faster, at higher volume, and with a confidence that makes them harder to question than if a person had made the same mistake.
AI tools like Microsoft 365 Copilot work by reasoning over your existing data to generate answers, summaries, and recommendations. The quality of that reasoning is bounded by the quality of the data it can reach. If your SharePoint contains a pricing document from three years ago that was never deleted, Copilot can cite it in a customer proposal. If your CRM has duplicate contacts with conflicting phone numbers, an AI-generated outreach list will include both. If your email archive contains a message from a former employee promising a customer something that was never delivered, an AI summary can surface that promise in a renewal conversation.
These are not hypothetical failure modes. They are the predictable results of layering a capable AI tool onto a data environment that has not been maintained with AI access in mind.
The good news is that addressing this does not require perfection. It requires knowing which data sources the AI tool will actually query and having a realistic view of what is currently in them, and whether the people using it will trust what comes back.
Who actually owns data quality at your company
At a 200-person company, the question of data ownership tends to produce either a blank stare or a reflexive answer of "the technology team." Neither is accurate. Technology teams can maintain systems and enforce access controls, but they cannot be held accountable for the accuracy of data they did not create and do not use.
The right ownership model maps accountability to domain. Your VP of Sales owns CRM data. Your HR director owns people data. Your finance lead owns the general ledger and financial close records. Your operations lead owns the documentation and process data that governs how work actually gets done. The technology function sets standards, maintains platforms, and enforces the rules, but it does not own the underlying data.
This matters for AI deployment specifically because most AI tools surface data in contexts where accuracy carries real consequences. A meeting summary that misattributes a commitment gets sent to a customer. An AI-drafted proposal pulls the wrong discount tier from a CRM record that was not updated after a contract was renegotiated. An AI-generated status report summarizes a project based on SharePoint documents that reflect the original plan, not the current state.
Naming a data owner per domain, even informally, changes the dynamic. It creates a person accountable for keeping data current, not just for filing it away.
Metadata and naming conventions that AI depends on
Before an AI tool can reason over your organization's data, the data has to be structured well enough that the AI can tell what it is, when it was created, and who it belongs to. That requires consistent naming, complete metadata, and a folder or record structure that reflects how your organization actually works, not the structure someone designed in theory years ago.
Microsoft Purview describes the goal as being able to "confidently discover, access, and manage your data for analytics and AI." That confidence is not something a tool creates on its own. It reflects a state your organization establishes in the underlying data before the tool can deliver on it.
The table below maps common data domains against their typical governance state and the downstream impact on AI readiness.
| Data domain | Good state | Needs work | Broken state and AI risk |
|---|---|---|---|
| Email and calendar | Retention policy active; terminated accounts purged | Retention exists but not enforced consistently | No retention; former employee mailboxes still accessible |
| Shared file storage | Folder structure by project or department; stale content archived | Partial naming conventions; mixed-use folders | No structure; outdated files alongside current ones with no distinction |
| CRM | Required fields enforced; deduplication run; accounts linked to contacts | Partial field completion; some duplicate records | No hygiene; conflicting records; stale contacts never removed |
| HR and directory | Directory matches payroll; roles and managers current; offboarding removes access | Mostly accurate; occasional stale roles or delayed offboarding | Directory out of sync with payroll; ghost accounts from former employees |
The most common structural gap is orphaned content: files and records that no one owns, that have not been touched in years, and that an AI tool has no way to disregard on its own. Addressing this before deployment is the work that most directly improves output quality.
Why retention and access controls come before AI
Two governance prerequisites that most mid-market organizations have not addressed before an AI pilot: retention policies and access controls. Both are frequently treated as compliance items rather than AI readiness items, which is why they get deprioritized until something goes wrong.
Retention policies determine how long data is kept and what gets purged. For AI tools, the relevance is direct. An AI assistant that can reach your full email history, including messages from employees who left the company years ago, will surface that content when it is contextually relevant. The tool has no mechanism to know that a contract negotiation from 2019 should carry less weight than the current agreement. Retention policies are the mechanism that removes stale data from AI reach before it becomes an active problem.
Access controls determine which data any given user can reach. Microsoft 365 Copilot inherits your organization's existing Microsoft 365 security, privacy, and compliance policies, which means any access control gap that existed before deployment exists at AI-assisted speed after it. An employee who should not have access to a salary report but can reach a folder where one was saved will be able to ask Copilot questions about it.
Both warrant a structured review before any AI deployment begins. Seven Roots' AI readiness assessment is designed to surface these gaps before they become active problems, not after.
What to fix first, in the right order
Most organizations approach AI readiness as a single gate to clear. The more useful frame is a sequence of four steps, each of which makes the next one more tractable.
Start with data ownership. Before you touch any system or file structure, name a person accountable for each major data domain. This does not require job title changes. It requires a documented list and a brief conversation with each domain owner about what current and accurate means for their data.
Second, address access controls. Run an audit of who has access to what in your primary collaboration platform. Oversharing is common in mid-market file storage environments, and it becomes a governance liability the moment an AI tool can query across that content. Microsoft's contractual privacy commitments for enterprise customers give organizations control over their data, but that control only functions if internal access permissions are set correctly in the first place.
Third, set or enforce a retention policy for email and file storage. Decide what the policy is and apply it consistently to the data sources your AI tool will reach.
Fourth, clean up naming and metadata in the data domains the tool will use most. Keep this targeted: the shared drives, the CRM records, the directory, not every system you operate.
Only after those four steps are in place does an AI deployment have a foundation worth building on. If you are not sure where your organization stands across these areas, Heartwood is a useful place to start that conversation.
Common questions about data governance before AI deployment
How clean does our data actually need to be?
Not perfect. The more useful question is whether your data is accurate enough for the specific use cases you are deploying. Meeting summaries and email drafts tolerate more noise than AI-generated customer proposals or financial analysis. The floor is this: any data source an AI tool will actively query should be reasonably current, owned by someone, and accessible only to the people who should see it. If you cannot answer those three questions for a given data domain, that domain is not ready for AI.
What's the minimum viable data governance before Copilot?
Three things: a clear answer to who owns which data, a working retention policy for email and SharePoint, and an access audit to confirm that permissions reflect current roles rather than historical ones. These do not require a formal data governance program or dedicated staff. They require structured conversations and a focused work session with your technology lead. The reason Microsoft highlights SharePoint Advanced Management as a Copilot readiness item is that oversharing is the most common and consequential gap in mid-market environments.
Who owns data governance at a 200-person company?
Ownership maps to domain, not function. Your HR director owns people data. Your VP of Sales owns CRM data. Your finance lead owns financial records. Your technology team maintains the platforms and sets the standards, but it cannot be accountable for data it did not create and does not use. The technology leader, whether fractional or full-time, sets the policy and holds domain owners accountable. The actual accountability for data accuracy lives with the people who generate and use that data every day.
How is this different from SharePoint cleanup?
SharePoint cleanup is one part of this, not the whole thing. Data governance for AI deployment covers every source an AI tool can reach: CRM records, HR systems, email archives, financial data, and whatever industry-specific platforms your team uses. It also covers the policies and ownership decisions that determine whether data stays reliable over time, not just whether it is clean right now. SharePoint organization is a tactical task. Data governance is the framework that keeps it from becoming a mess again six months later.
How long does this take?
The ownership and access control conversations can happen in a week with the right person facilitating them. The cleanup work, in terms of fixing naming conventions, enforcing retention policies, and running an access audit, typically takes four to eight weeks for a company of 100 to 300 people, depending on how much historical debt exists and how far you scope the effort. The goal is not to solve everything before deployment. The goal is to have the most consequential gaps addressed before AI starts querying your data at scale.
One technology decision a month, taken apart.
The decision brief: one technology decision a month, taken apart. No spam, unsubscribe anytime.
Working through the same questions? Let's compare notes.
The AI and agentic space is moving faster than any playbook, and the best thinking in it happens in the open. We are glad to connect, trade notes, and compare approaches. We also take on a small number of select advisory engagements where the fit is right.
Connect with Seven RootsNot sure what you need yet? Ask the panel.
Heartwood is an AI advisory panel for mid-market executives who need on-demand technology strategy guidance. Start with your toughest question.
Try Heartwood free