AI Integration Security: What to Ask About Your Data Before Building

Every AI integration we scope involves a conversation about data before it involves a conversation about features. Not because it’s a legal formality, because the data questions determine whether the integration is actually safe to build. Most vendors don’t raise them. Most developers don’t either. The questions below are what you need answered, in writing, before any AI touches your business.

What “AI Integration” Actually Means for Your Data

The phrase “AI-powered” in a product description tells you nothing about where your data goes. It is a feature flag, not a security endorsement.

The Difference Between Processing and Training

Two things can happen to your data when it enters an AI system: it can be processed (used to generate a result, then discarded), or it can be used to train future models. These are fundamentally different from a privacy standpoint, and most vendors blur the distinction deliberately.

OpenAI’s API, by default policy, does not use API submissions to train models. Their ChatGPT consumer product, by default, does. If your developer connected your CRM to the wrong endpoint, your customer conversations may already be in a training dataset. Ask your vendor explicitly: does data submitted via this integration feed model training? Get the answer in writing.

Hidden Data Flows in Existing Tools

The riskiest integrations are often the ones you didn’t consciously choose. Your CRM’s “AI summary” feature. The “smart replies” in your email platform. The analytics tool that added an AI assistant last quarter.

Each one creates a data flow you may not have reviewed. Each flow likely involves a sub-processor, a third-party AI provider the software vendor contracted without notifying you. Under GDPR, you are responsible for those flows even if you didn’t initiate them. The first question for any existing tool: has this product added AI features in the last 18 months? If yes, read the updated ToS before doing anything else.

The Five Real Security Risks SMBs Face

Third-Party Data Exposure

Third-party systems are behind 30% of all breaches. Every AI vendor you connect to extends your attack surface. They have their own vendors. Those vendors have vendors. By the time you’re two hops out from your CRM, you have no visibility, and no contractual relationship, with the systems touching your data.

Prompt Injection and Data Leakage

Prompt injection is ranked #1 on the OWASP Top 10 for LLM Applications. It means an attacker can craft input that manipulates an AI system into revealing data it shouldn’t. If your AI integration processes customer-submitted text, support tickets, form submissions, order notes, and you haven’t scoped what data the AI can access, a crafted message could extract records it has no business touching.

Credential Theft via Conversation History

Over 300,000 ChatGPT credentials were found in infostealer malware in 2025. Those accounts contained full conversation histories. If your team uses shared AI tools and pastes customer data into chat sessions for convenience, which they do, at most companies, that data lives in conversation logs until someone deletes it. Most people never delete it.

Unreviewed Sub-Processors

Every AI vendor uses sub-processors: cloud infrastructure, model providers, logging services. Your DPA (Data Processing Agreement) with the primary vendor means little if their sub-processors are not contractually bound to the same standards. Ask for the sub-processor list. If they won’t provide one, that is your answer.

If you serve EU customers, GDPR applies to every tool that processes their personal data, regardless of where the tool is hosted. A US-based AI vendor without Standard Contractual Clauses in place is not a compliant option for handling EU PII. CCPA adds similar obligations for California customers. “The vendor is compliant” is not sufficient, compliant with what, by whose audit, and covering which data flows?

How to Evaluate an AI Vendor Before Integration

The Contract Questions That Actually Matter

Before signing anything, get written answers to these five questions:

Does data submitted to this integration train your models? If yes, is there an opt-out, and what does opting out require?
Who are your sub-processors, and are they contractually bound to the same data standards?
Where is my data stored, and in which legal jurisdictions?
What is your data retention policy, and can I request deletion?
Do you have a signed Data Processing Agreement available, and does it cover all EU/UK data subjects?

If a vendor stalls, hedges, or directs you to a generic privacy page, that is a red flag, not a legal grey area.

Certifications That Matter vs. Certifications That Don’t

SOC 2 Type II is frequently cited as proof of security. It is not. SOC 2 audits a vendor’s internal controls, physical security, access management, incident response. It says nothing about whether they train models on your data. A vendor can hold SOC 2 Type II certification and still legally use your submissions to improve their model, if their ToS permits it.

The certifications worth scrutinizing: ISO 27001 (information security management), ISO 27701 (privacy information management), and specific GDPR adequacy decisions for cross-border data transfers. Even then, read the scope. Certifications cover the scope they cover, nothing more.

Red Flags in AI Vendor Terms of Service

Specific language to watch for:

“We may use your content to improve our services”, this is training language
“You grant us a license to use your data” with no scope limitation
“Sub-processors may change without notice”, you lose the ability to review the chain
Dispute resolution clauses requiring US arbitration, creates a legal barrier for EU-based complaints
Data retention terms longer than your own legal obligations require

The fact that a vendor buries these clauses does not reduce your liability. You accepted the terms.

Data Minimization and Least Privilege in Practice

What Data Does This Integration Actually Need?

Most AI integrations are scoped lazily. A developer connects an API key with full account access because it’s faster. The AI tool ends up with read access to your entire customer database when it only needed order history for the last 90 days.

Before any integration goes live, map the minimum data required for the integration to function. Enforce that scope at the API level, not through a vendor promise. If the vendor’s integration requires broader access than the use case justifies, that is worth questioning.

Scoping Permissions and Auditing Access Points

Audit who has connected what to your business systems in the last 24 months. Most SMBs find tools they’ve forgotten about, old CRM integrations, deprecated app connectors, marketing tools that were trialled and never disconnected. Every live API connection is a potential data flow. Review, scope, and revoke anything that isn’t actively justified.

If your custom WordPress development or CRM is being fitted with AI features by a developer or plugin vendor, ask them to document every third-party API call the implementation makes. If they can’t, the implementation isn’t finished.

Frequently Asked Questions

Does my AI vendor’s SOC 2 certification mean my data is protected?

No. SOC 2 Type II certifies a vendor’s internal controls, how they manage access to their own systems, how they handle incidents, how they train their staff. It does not restrict how they use your data for model training or product improvement. Always check the vendor’s ToS and DPA separately from any certification claim.

“GDPR-compliant” needs to be verified, not accepted as a marketing claim. Check whether the vendor offers a signed DPA covering your data flows, whether Standard Contractual Clauses are in place for transfers outside the EEA, and whether their sub-processors are named and bound to the same standards. A tool can claim GDPR compliance and still expose you to regulatory risk if those mechanisms aren’t properly in place.

What is a Data Processing Agreement and do I need one with every AI vendor?

A Data Processing Agreement (DPA) is a contract that governs how a vendor handles personal data on your behalf. Under GDPR, if a vendor processes personal data of EU residents at your instruction, a DPA is legally required, not optional. For US-only operations with no EU data subjects, a DPA is still good practice. Without one, you have no contractual basis for how your data is used, retained, or deleted.

Can AI tools use my business data to train their models?

Yes, if the ToS permits it. Many consumer-facing AI products include training rights in their default terms. Enterprise API tiers typically offer stronger protections and explicit training opt-outs, but you have to check. Ask directly, get confirmation in writing, and make sure your DPA reflects whatever opt-out you’ve agreed to. Don’t assume API access automatically means no training.

What’s the minimum I should check before connecting an AI tool to my CRM or website?

Three things, minimum: (1) Confirm in writing whether data is used for training and how to opt out. (2) Request the sub-processor list and confirm they’re contractually bound. (3) Confirm a DPA is available and signed before data starts flowing. Beyond that, an independent review of your stack before you commit to any integration is worth the time, see how we approach this at designodin.com/ai.

What happens if a vendor I’ve already integrated changes their data policy?

Most vendors reserve the right to update ToS with 30-day notice, or less. That update can change how they handle your data. Set a calendar reminder to review the ToS of every AI-connected vendor annually, and subscribe to their changelog or legal update notifications if available. If a material change affects your data rights, you have grounds to exit the contract, but only if you catch it.

Most AI vendors write their data-use terms to protect themselves, not you. Most web agencies recommending AI integrations haven’t read those terms. If you want to talk through what this looks like for your operation before committing, start a conversation, we’ll tell you directly what questions to push back on.