AI Integration Due Diligence: What to Check Before Committing

Most AI vendor checklists are written by vendors. They are structured around criteria that polished vendors pass easily, documentation completeness, demo quality, certifications, and they skip the questions that actually matter: who owns your data after you cancel, what happens when accuracy degrades six months in, and whether what you’re buying is a real integration or a prompt wrapper with a branded UI. What follows is what we check before committing to a build, and what we tell clients to check before they sign anything.

Why Most AI Due Diligence Checklists Fail Buyers

The Vendor-Authored Checklist Problem

The top three search results for “AI vendor evaluation checklist” are published by AI vendors. BotsCrew, Lumenalta, and similar firms structure their checklists around criteria that polished vendors pass easily, documentation completeness, demo quality, certifications. None of them ask whether your input data will train their model. None address exit clauses or what happens when accuracy degrades six months post-launch.

A checklist designed to be passed isn’t a filter. It’s a sales tool in disguise.

What “AI Readiness” Deflection Looks Like in Practice

A common vendor tactic: when asked for concrete performance benchmarks, pivot to “your data isn’t AI-ready yet.” This extends the engagement, keeps the vendor on retainer, and shifts accountability for results onto the buyer.

A real integration partner should be able to work with your data as-is, tell you plainly what the tool will and won’t do on that data, and give you a measurable benchmark before the build starts. If they can’t do that at proposal stage, they can’t do it at delivery either.

Before You Evaluate Any Vendor, Define What You Actually Need

Integration vs. Automation vs. Advisory

These three are frequently conflated in vendor proposals and they are not the same thing.

An integration connects an AI model to your existing systems, your CRM, your ERP, your WooCommerce store, and acts on real data in real time. An automation applies a fixed AI workflow to a repeatable task (invoice processing, meeting summaries, product description generation). An advisory tool surfaces analysis and recommendations; a human still makes the decision and takes the action.

Know which one you’re buying. Vendors who blur the lines are either confused or billing for complexity they’re not delivering.

The Build-vs-Buy Decision You Should Make First

Off-the-shelf AI tools cost less upfront and fail in predictable ways, generic outputs, no memory of your workflows, poor fit for niche use cases. Custom builds cost more upfront and fail in different ways, scope creep, dependency on the agency that built it, maintenance gaps.

The decision matrix is simple: if the use case is standard (meeting notes, email triage, basic support routing), buy off-the-shelf and accept the limitations. If the use case is specific to your operations, your pricing logic, your product catalogue, your customer data, a custom build is the only path to reliable results. We scope custom AI builds before any commitment, one job, delivered with documentation, owned by you at handoff. Talk to us about what that looks like for your setup.

The Technical Due Diligence Questions

What Is the Model, and Do You Own the Output?

Ask the vendor directly: which model is this built on? GPT-4o, Claude 3.5, a fine-tuned open-source model, something proprietary? “Proprietary AI” often means a wrapped API call to OpenAI with a branded UI.

Also confirm: who owns the outputs? In most jurisdictions, AI-generated content isn’t automatically copyrightable. The vendor’s terms may claim a license to outputs for their own training or marketing. Read section 4 of the terms of service before the demo, not after.

Where Does Your Data Go, and Does It Train Their Model?

This is the question most buyers forget to ask. When you feed customer records, sales data, or internal documents into an AI tool, that data has to go somewhere. The critical questions:

Is your data used to train or fine-tune the vendor’s model?
Is it stored after the session ends? For how long?
Is it shared with the underlying model provider (OpenAI, Anthropic, Google)?
What happens to stored data if you cancel?

OpenAI’s default API terms do not use API inputs to train models unless you opt in. Many SaaS products built on top of GPT have their own data retention policies that differ significantly from OpenAI’s. The vendor’s answer to “does my data train your model?” should be specific and documented, not verbal reassurance.

How Is Accuracy Measured, and What Happens When It Drifts?

AI model performance degrades over time. A classification model trained on your Q1 2025 data will produce different results on Q1 2026 data if customer behavior, product mix, or language patterns have shifted. This is called model drift, and most vendor contracts say nothing about it.

Before signing, ask for a written accuracy baseline, the metric they’re committing to at launch. Then ask what triggers a review if accuracy drops below that baseline, and who bears the cost of retraining or recalibration. If the vendor doesn’t track accuracy post-launch, you’ll discover the degradation when a customer notices, not when they do.

The Contract and Commercial Due Diligence Questions

Pricing Structure: Seat Licenses, Usage Fees, and Hidden Scaling Costs

AI vendor pricing breaks in three common ways. First: token-based usage fees that scale unpredictably when volume spikes. A tool that costs $400/month at current volume may cost $4,000/month when your team actually adopts it. Second: seat licenses that charge per user regardless of usage, fine for daily tools, punishing for tools used by two people once a week. Third: platform fees that increase when you exceed an undocumented tier.

Get a pricing model that shows worst-case monthly cost at 3x your current projected usage. If the vendor can’t or won’t provide that number, build in an uncapped spend ceiling in the contract.

Data Ownership, Portability, and Exit Clauses

Two non-negotiable contract requirements:

Data portability: You must be able to export all your data, inputs, outputs, configurations, training data if applicable, in a standard format (JSON, CSV, SQL dump) within 30 days of contract termination. No export functionality means no exit.

Exit clause: What happens to your data 30, 60, 90 days after you cancel? Is it deleted? Anonymized? Retained for the vendor’s use? The contract must specify a deletion timeline and confirmation method. “We delete it” as a verbal promise is not a contract term.

If the vendor built a custom integration on your infrastructure, confirm in writing that the code and configuration files are yours, not theirs. We’ve seen disputes where an agency claimed proprietary rights to automation workflows built on a client’s own systems. Our studio has handled enough post-integration rescues to know these disputes are avoidable if the contract is specific upfront.

SLAs That Actually Mean Something

Uptime SLAs (99.9% availability) are standard and largely meaningless for AI integrations. The SLA you actually need covers:

Accuracy floor: Minimum performance threshold on your use case, measured quarterly
Response latency: Maximum acceptable processing time under peak load
Incident response time: How long before a production failure gets a human response, not an automated ticket acknowledgment
Model update notification: Written notice before the vendor changes the underlying model, with a testing window before rollout to your production environment

Any vendor unwilling to commit to accuracy SLAs is telling you they don’t believe in the accuracy of their own product.

Integration and Operational Risk

Who Owns the Integration Work, You or the Vendor?

This question is particularly important for custom builds. If the vendor writes code that connects their platform to your systems, that code may be:

Proprietary (you can’t access or modify it)
Dependent on their platform in ways that make migration prohibitively expensive
Undocumented, meaning a new developer can’t maintain it without the original vendor

The deliverable you need at project end is: complete source code, environment configuration, deployment documentation, and a working handoff session with your internal team or a third-party developer of your choice. If the vendor quotes ongoing maintenance as mandatory rather than optional, you don’t own the integration, you’re renting it.

What Happens When the Vendor’s Underlying Model Changes?

In 2024, several products built on GPT-3.5 broke when OpenAI deprecated that model. Vendors who hadn’t notified clients or built version-pinning into their integrations left customers with non-functional tools over a weekend.

Ask: what is your model versioning strategy? Are integrations pinned to a specific model version? How much notice will you give before a model update, and who owns the regression testing cost?

Monitoring, Logging, and Incident Response

Production AI integrations need observability. You should have access to:

Input/output logs for every transaction (critical for audit trails in finance, legal, HR)
Error rates and failure modes, when the integration fails, what does it fail to?
Latency metrics over time
An alert mechanism that notifies your team when something breaks

If the vendor controls all observability and you have no direct access to logs, you are dependent on them to tell you when something is wrong. That is not an acceptable operational posture.

The “AI Theater” Filter, Spotting Prompt Wrappers Sold as Products

Questions That Separate Real Engineering from Rebranded ChatGPT

“AI theater” is a prompt wrapper, a form that sends your input to ChatGPT, formats the response, and bills you $500/month for the interface. It’s not a real integration. It doesn’t connect to your data, doesn’t persist context, and breaks if the underlying API changes.

Ask these questions:

Show me the architecture diagram, where does the model sit, what does it connect to, what data flows in and out?
What happens if OpenAI has an outage?, If the answer is “the tool goes down,” it’s a wrapper with no fallback.
How does the tool know about my specific business?, If the answer is “you tell it in the prompt,” it has no memory and no real integration.
Can I see a production example from a current client?, Not a demo environment, not a sandbox. A live client who will confirm performance.

A real integration has a data connection, a feedback loop, and observable performance. A prompt wrapper has a nice UI and a vague reference to “proprietary AI.”

Reference Checks That Reveal Actual Production Performance

Demo environments are optimized for demos. Ask for two or three reference clients with similar use cases, not logos on a website, but actual humans you can call. The questions to ask those references:

What was the accuracy at launch vs. six months later?
What broke first, and how long did it take to fix?
Who owns the code and documentation?
Would you sign with this vendor again?

A vendor who hesitates to provide references, or provides only enterprise clients when you’re an SMB, is showing you something. Pay attention to it.

Frequently Asked Questions

What is AI integration due diligence and why does it matter for SMBs?

AI integration due diligence is the structured process of evaluating a vendor, contract, and technical approach before committing budget to an AI project. It matters for SMBs specifically because SMBs typically lack the procurement staff and legal resources that enterprise buyers use to catch problems before they’re expensive. The median annual AI budget waste for SMBs is $18,000, most of it preventable with upfront scrutiny.

What questions should I ask an AI vendor before signing a contract?

Five questions that filter out weak vendors quickly: (1) What model is this built on, and will it change? (2) Does my input data train your model? (3) What accuracy commitment will you put in the contract? (4) What is the exit process and how long until my data is deleted? (5) Can I speak with two current clients in a similar use case? A vendor who deflects any of these questions, particularly the accuracy and exit questions, is a vendor worth walking away from.

How do I know if a vendor is using my data to train their model?

The only reliable answer is the contract. Verbal assurances and privacy policy pages are overridden by terms of service clauses that may specify training data rights in sub-sections. Look specifically for language about “improving services,” “anonymized data use,” and “third-party model providers.” If the vendor uses an underlying model API (OpenAI, Anthropic, Google), also check that provider’s enterprise terms, some pass-through obligations are governed at the API level, not just the vendor level.

What should an AI vendor SLA actually include?

Standard uptime SLAs (99.9% availability) are table stakes and not the most important metric for AI integrations. The SLA that matters covers accuracy, a defined minimum performance threshold on your specific use case, reviewed quarterly. It should also include model update notification windows (minimum 14 days written notice before any model change affecting production), incident response time with an escalation path, and a data portability guarantee if you cancel.

What is “AI theater” and how do I avoid paying for it?

AI theater is a product or service that looks like a custom AI integration but is actually a thin interface over a general-purpose model API, no persistent memory, no real data connection, no proprietary engineering. You avoid it by asking for an architecture diagram, requesting a live client reference (not a demo), asking what happens when the underlying model provider has an outage, and confirming what specifically your data is connected to. A real integration can answer all of these questions with specifics. A wrapper cannot.

What happens to my data and workflows if the vendor shuts down or pivots?

Without a contractual exit clause specifying data portability and deletion timelines, you have limited recourse. Negotiate these terms before you sign: a 30-day export window in a standard format (JSON, CSV, SQL), a confirmed deletion schedule, and source code access for any custom integrations built on your infrastructure. Treat the exit terms as carefully as the onboarding terms, the circumstances under which you’ll need them are precisely the circumstances under which the vendor has the least incentive to cooperate.

The five checks that matter most: who owns your data and for how long, what the accuracy commitment is in writing, whether the code is yours at handoff, how you exit without losing your work, and whether what you’re buying is a real integration or a prompt wrapper in a branded UI. Miss one of these and you’ll likely be back to market within 18 months.

If you want to talk through what this looks like for your operation, a specific vendor proposal, a build you’re scoping, or a contract you’re not sure about, start a conversation. We’ll be direct about what we see. See how we scope and build this at designodin.com/ai.