AI Tool Data Privacy: Custom Build vs Third-Party SaaS

When you send data to a SaaS AI tool, it goes somewhere, a vendor’s servers, their sub-processors, retention logs you haven’t read. The terms that govern what happens next are in a document most businesses have never opened. This isn’t a future risk. It’s the current state of any operation that adopted an AI tool without reading the DPA first.

What Actually Happens to Your Data in a Third-Party AI Tool

Every SaaS AI product has a data processing chain: your input travels to the vendor’s API, gets processed (usually by an LLM from OpenAI, Anthropic, or Google), and a response comes back. That’s the part users think about. What they don’t think about is what the vendor retains, for how long, and under what terms.

The Data Processing Agreement Most People Never Read

A Data Processing Agreement (DPA) is the document that governs what a vendor can do with the data you send them. It specifies retention periods, sub-processors (third parties the vendor shares your data with), data residency (where servers are physically located), and your rights as the data controller.

Most SaaS AI tools have a DPA. Very few users or business owners have read it. If you’re in the EU, you’re legally required to have one in place before sending personal data to any processor, that includes AI tools. If you’re in the US handling healthcare, financial, or legal data, sector-specific regulations apply and a DPA is your first line of evidence that you did due diligence.

”We Don’t Train on Your Data”, What That Claim Actually Covers (and Doesn’t)

This is the single most misunderstood statement in AI vendor marketing. “We don’t train on your data” typically means the vendor won’t use your inputs to fine-tune their base model. It says nothing about: whether your data is retained for abuse monitoring, whether it’s used to improve retrieval or ranking systems, whether sub-processors have access, or where it’s stored.

OpenAI’s API terms, for instance, do not use your inputs to train models by default, but that’s the API. The ChatGPT free web product operates under different terms. An employee using their personal ChatGPT account at work is not covered by your company’s API agreement. These are two separate products with separate policies, and conflating them is where most SMBs get caught.

The Real Privacy Risks for SMBs Using SaaS AI Tools

Enterprise compliance teams know to worry about data governance. SMBs often don’t have a compliance team, and the risks look different at that scale anyway. The threat isn’t usually a targeted attack. It’s ordinary workflow behaviour.

Shadow AI: Employees Using Personal Accounts With No DPA

Shadow AI is the term for AI tool usage that happens outside your company’s sanctioned agreements. A staff member drafts a client proposal in their personal ChatGPT account because it’s faster than the company tool. A developer pastes a client’s database schema into an AI coding assistant on their home computer. 15% of employees have pasted sensitive information, PII, financial data, code, into public LLMs at some point. That data is now in a third-party system under personal terms of service, with no DPA binding anyone.

20% of global organisations have suffered a data breach directly attributed to shadow AI incidents. For most SMBs, this is the actual threat vector, not a sophisticated attack, but an employee making a reasonable-seeming shortcut with no idea it has compliance implications.

Vendor Data Residency vs. Your Compliance Obligations

Data residency means the physical location of the servers where your data is processed and stored. If you’re a UK or EU business, GDPR requires that personal data transferred outside the EU/UK is subject to adequate protections. Many AI vendors process data on US servers by default. Some offer EU data residency on enterprise tiers, not on standard plans.

This isn’t theoretical. If a French professional services firm uses an AI tool to draft documents containing client personal data, and that tool processes inputs on US servers without Standard Contractual Clauses in place, they are in breach of GDPR. The fine risk sits with the business, not the vendor.

What a Custom-Built AI Tool Changes About Data Control

A custom AI tool built for your business doesn’t mean building a model from scratch. It typically means building an application, with a defined interface, defined inputs, and defined outputs, that calls an LLM API under your own API agreement. The architecture difference matters.

Your Infrastructure, Your Rules, What “Nothing Leaves the Perimeter” Means in Practice

When a custom tool is built to run within your infrastructure (on-premise or in your own cloud environment), data does not go to a vendor’s shared processing layer. It goes to the API endpoint you’ve contracted with, under terms you’ve negotiated, with sub-processors you’ve reviewed. If the tool is fully on-premise with a locally hosted model, data never leaves your network at all.

This is not a theoretical architecture, it’s what Designodin builds for SMBs that handle client data regularly. A law firm processing contract drafts. An accountancy processing client financial data. A recruitment business handling candidate PII. These are not edge cases. They’re standard professional services workflows where a shared SaaS AI tool is the wrong call.

Defined Inputs and Outputs: Why Scope Limits Risk

A custom tool also enforces scope by design. A generic AI assistant like ChatGPT will accept anything you paste into it. A purpose-built tool for, say, generating client status reports pulls from a defined data schema, specific fields, no free-text input of arbitrary sensitive data. The narrower the input surface, the smaller the risk surface.

This is the privacy benefit that standard SaaS tools cannot replicate, because SaaS tools are built to be general. Custom tools are built to do one job, and that specificity is the security feature. That said, it only holds if the scope is correctly defined at build time. A custom tool with poorly scoped inputs can expose data just as readily as a SaaS product.

When SaaS AI Is Fine (and When It Isn’t)

This isn’t a binary argument. SaaS AI tools are appropriate for a large portion of business AI use cases. The question is whether your specific use case falls in the safe zone.

Low-Risk Use Cases: Generic Tasks, No Client Data

Drafting a LinkedIn post about an industry topic, no client data. Summarising a public news article, no sensitive data. Generating first-draft internal training documentation using information you’d be comfortable publishing, fine. Brainstorming product names, writing job descriptions for publicly advertised roles, creating generic email templates, all appropriate for standard SaaS AI tools.

The test: if the input contained no information you’d object to seeing on a public forum, the privacy risk is low. SaaS AI tools are the right choice here, cost-efficient, fast, no build overhead required.

High-Risk Use Cases: Client PII, Contracts, Financials, Proprietary Process

A client’s name, address, company registration number, or financial figures, these are personal data under GDPR and sensitive data under most US frameworks. Contract terms, pricing schedules, internal process documentation, source code for proprietary systems, these are commercially sensitive even if not legally regulated.

Any AI workflow that regularly processes these categories should not be running on a consumer-grade or standard-plan SaaS tool. Either the tool needs an enterprise DPA with verified sub-processors and data residency controls, or the workflow needs to be on a custom build. There’s no third option that manages the risk.

What a Custom AI Tool Actually Costs for an SMB

The assumption that custom AI development costs six figures comes from enterprise case studies. A purpose-built AI tool for an SMB use case, a defined workflow, specific inputs, specific output format, is a different project.

The Real Build Comparison (Not the Enterprise Quote)

A custom AI tool scoped to one job: intake a document, extract specific fields, generate a structured output, post it to an internal system. That’s a realistic SMB use case. A project like that runs 4–8 weeks, depending on how clean your existing data and systems are. The build cost for a tightly scoped tool like that is substantially lower than enterprise quotes. The enterprise figure of $120,000+ reflects enterprise scope, multiple integrations, compliance audit, change management for hundreds of staff. An SMB tool does not inherit that scope. It also doesn’t inherit the enterprise support structure, so factor in who maintains it when the workflow changes.

Break-Even Math: SaaS Spend vs. Owned Asset

A business spending $400/month on a SaaS AI tool it’s been using for two years has spent $9,600, on something it doesn’t own, can’t modify, and whose terms can change at renewal. A custom build at a comparable cost produces an asset the business owns outright and can modify. For stable workflows that are central to the operation, the break-even case is short. For workflows that are experimental or low-frequency, SaaS is the right call, a custom build for a process you’ll change in six months is money wasted.

The decision rule isn’t ideological, it’s financial and operational. Stable, sensitive, high-frequency workflow with client data? Build. Experimental, low-sensitivity, occasional use? Subscribe.

Frequently Asked Questions

Does a third-party AI tool see my prompts and data?

Yes. Your inputs travel to the vendor’s servers for processing. Whether the vendor retains them, for how long, and for what purpose depends on their terms of service and your specific plan tier. Enterprise plans typically offer stronger retention controls and contractual protections than standard or free tiers.

What is a Data Processing Agreement and do I need one?

A DPA is a contract between you (the data controller) and a vendor (the data processor) that governs how they handle personal data you share with them. If you’re in the EU or UK, you are legally required to have a DPA with any vendor that processes personal data on your behalf, including AI tools. If you’re in the US handling HIPAA-regulated healthcare data or similar, comparable documentation requirements apply under those frameworks.

Can my SaaS AI vendor use my inputs to train their model?

It depends on the product and plan. Many enterprise API agreements explicitly prohibit this. Consumer-tier products, including free plans, often do use inputs for model improvement. The key is reading your specific plan’s terms, not the vendor’s marketing headline. “We don’t train on your data” is not a universal statement that applies to every product a vendor sells.

Is a custom AI tool realistic for a small business budget?

For a tightly scoped use case, yes. The budget question depends entirely on scope. A purpose-built tool that handles one defined workflow, not a general AI assistant, not an enterprise platform, can be achievable for SMBs. If the workflow is poorly defined or the underlying data is messy, scope creep will push the cost up. If you want to see what a scoped build would look like for your specific workflow, talk to us.

What types of data should never go into a third-party AI tool without checking the DPA?

Client personal data (names, addresses, contact details), financial data (account numbers, revenue figures, pricing terms), contract terms and legal documents, source code for proprietary systems, healthcare data, and any information covered by a client confidentiality agreement. If your client has a standard NDA with you, that NDA likely prohibits you from sharing their information with third parties, which a SaaS AI vendor is. Check your contracts before you check your AI tool’s features.

How do I find out what AI tools my team is actually using?

Start with a quick review of expensed software subscriptions and browser extensions on company devices. Then run a direct staff survey, people use tools they think are helping. For a more structured assessment, talk to us.

If your team is using AI tools and you haven’t read the DPA for each one, you don’t know where your client data is going. If a workflow needs to move off a shared SaaS tool and onto something you own and control, tell us what you’re working on. We’ll be direct about whether we can help.