The AI pilot failed. Or it ran for a few weeks, then started producing outputs that were embarrassingly wrong. Or leadership got a demo, asked a few real questions, and the answers made your data problems visible in a way no explanation could fix. If any of that sounds familiar, you are not alone. The question worth sitting with before the next initiative gets funded: what needs to be true about your data before AI can do anything useful with it?

Here is what needs to be in place.

What data governance for AI deployment actually means

Data governance for AI deployment is the set of policies, ownership decisions, and structural practices that determine whether your organization's data is reliable, bounded, and accountable enough for AI tools to produce results worth acting on.

That definition is narrower than data governance in the enterprise sense. You do not need a data lake, a formal data stewardship program, or a catalog platform with dedicated staff. What you do need is a clear answer to three questions about every data source an AI tool will touch: Who owns it? Is it current? And who is allowed to see it?

Those questions are more specific than they sound. For a company without a dedicated data team, answering them requires looking at your actual systems, not an idealized picture of them. Your CRM data probably has missing fields that no one ever enforced. Your shared drives have files that have not been opened in three years. Your HR records in your payroll system may not match what is in your directory. None of this is unusual. It becomes a problem the moment an AI tool starts drawing on these sources to generate answers that people trust.

Seven Roots' AI readiness assessment is a practical starting point for mapping where your data actually stands before a deployment.

Bad data gets worse when AI starts using it

The assumption that derails most mid-market AI deployments is that the tool will compensate for data quality gaps. It will not. An AI assistant that surfaces outdated records surfaces them faster, at higher volume, and with a confidence that makes them harder to question than if a person had made the same mistake.

AI tools like Microsoft 365 Copilot work by reasoning over your existing data to generate answers, summaries, and recommendations. The quality of that reasoning is bounded by the quality of the data it can reach. If your SharePoint contains a pricing document from three years ago that was never deleted, Copilot can cite it in a customer proposal. If your CRM has duplicate contacts with conflicting phone numbers, an AI-generated outreach list will include both. If your email archive contains a message from a former employee promising a customer something that was never delivered, an AI summary can surface that promise in a renewal conversation.

These are not hypothetical failure modes. They are the predictable results of layering a capable AI tool onto a data environment that has not been maintained with AI access in mind.

The good news is that addressing this does not require perfection. It requires knowing which data sources the AI tool will actually query and having a realistic view of what is currently in them, and whether the people using it will trust what comes back.

Who actually owns data quality at your company

At a 200-person company, the question of data ownership tends to produce either a blank stare or a reflexive answer of "the technology team." Neither is accurate. Technology teams can maintain systems and enforce access controls, but they cannot be held accountable for the accuracy of data they did not create and do not use.

The right ownership model maps accountability to domain. Your VP of Sales owns CRM data. Your HR director owns people data. Your finance lead owns the general ledger and financial close records. Your operations lead owns the documentation and process data that governs how work actually gets done. The technology function sets standards, maintains platforms, and enforces the rules, but it does not own the underlying data.

This matters for AI deployment specifically because most AI tools surface data in contexts where accuracy carries real consequences. A meeting summary that misattributes a commitment gets sent to a customer. An AI-drafted proposal pulls the wrong discount tier from a CRM record that was not updated after a contract was renegotiated. An AI-generated status report summarizes a project based on SharePoint documents that reflect the original plan, not the current state.

Naming a data owner per domain, even informally, changes the dynamic. It creates a person accountable for keeping data current, not just for filing it away.

Metadata and naming conventions that AI depends on

Before an AI tool can reason over your organization's data, the data has to be structured well enough that the AI can tell what it is, when it was created, and who it belongs to. That requires consistent naming, complete metadata, and a folder or record structure that reflects how your organization actually works, not the structure someone designed in theory years ago.

Microsoft Purview describes the goal as being able to "confidently discover, access, and manage your data for analytics and AI." That confidence is not something a tool creates on its own. It reflects a state your organization establishes in the underlying data before the tool can deliver on it.

The table below maps common data domains against their typical governance state and the downstream impact on AI readiness.

Data governance state vs. AI readiness impact by domain
Data domain Good state Needs work Broken state and AI risk
Email and calendar Retention policy active; terminated accounts purged Retention exists but not enforced consistently No retention; former employee mailboxes still accessible
Shared file storage Folder structure by project or department; stale content archived Partial naming conventions; mixed-use folders No structure; outdated files alongside current ones with no distinction
CRM Required fields enforced; deduplication run; accounts linked to contacts Partial field completion; some duplicate records No hygiene; conflicting records; stale contacts never removed
HR and directory Directory matches payroll; roles and managers current; offboarding removes access Mostly accurate; occasional stale roles or delayed offboarding Directory out of sync with payroll; ghost accounts from former employees

The most common structural gap is orphaned content: files and records that no one owns, that have not been touched in years, and that an AI tool has no way to disregard on its own. Addressing this before deployment is the work that most directly improves output quality.

Why retention and access controls come before AI

Two governance prerequisites that most mid-market organizations have not addressed before an AI pilot: retention policies and access controls. Both are frequently treated as compliance items rather than AI readiness items, which is why they get deprioritized until something goes wrong.

Retention policies determine how long data is kept and what gets purged. For AI tools, the relevance is direct. An AI assistant that can reach your full email history, including messages from employees who left the company years ago, will surface that content when it is contextually relevant. The tool has no mechanism to know that a contract negotiation from 2019 should carry less weight than the current agreement. Retention policies are the mechanism that removes stale data from AI reach before it becomes an active problem.

Access controls determine which data any given user can reach. Microsoft 365 Copilot inherits your organization's existing Microsoft 365 security, privacy, and compliance policies, which means any access control gap that existed before deployment exists at AI-assisted speed after it. An employee who should not have access to a salary report but can reach a folder where one was saved will be able to ask Copilot questions about it.

Both warrant a structured review before any AI deployment begins. Seven Roots' AI readiness assessment is designed to surface these gaps before they become active problems, not after.

What to fix first, in the right order

Most organizations approach AI readiness as a single gate to clear. The more useful frame is a sequence of four steps, each of which makes the next one more tractable.

Start with data ownership. Before you touch any system or file structure, name a person accountable for each major data domain. This does not require job title changes. It requires a documented list and a brief conversation with each domain owner about what current and accurate means for their data.

Second, address access controls. Run an audit of who has access to what in your primary collaboration platform. Oversharing is common in mid-market file storage environments, and it becomes a governance liability the moment an AI tool can query across that content. Microsoft's contractual privacy commitments for enterprise customers give organizations control over their data, but that control only functions if internal access permissions are set correctly in the first place.

Third, set or enforce a retention policy for email and file storage. Decide what the policy is and apply it consistently to the data sources your AI tool will reach.

Fourth, clean up naming and metadata in the data domains the tool will use most. Keep this targeted: the shared drives, the CRM records, the directory, not every system you operate.

Only after those four steps are in place does an AI deployment have a foundation worth building on. If you are not sure where your organization stands across these areas, Heartwood is a useful place to start that conversation.