What Employees Are Actually Pasting Into ChatGPT (And What to Do About It)

Jump to section

Last verified: May 16, 2026. Vendor pricing and benchmarks refreshed quarterly.

Using AI at work is a privacy risk, but whether it is a serious one depends almost entirely on which tier your employees are using. Consumer ChatGPT (the free or Plus account linked to a personal Gmail) runs under OpenAI’s consumer data policy, which your company never agreed to and your IT team cannot see. Enterprise ChatGPT is a different product: no training on your data by default, admin-controlled retention, full audit logs, and access via SSO. Most companies have not formally decided which tier they require. That gap is the real privacy problem. It means employees are defaulting to whatever account they already have, and the answer is usually the personal one.

The Answer Nobody Wants: Your AI Policy Is Already Being Violated

Most operators have a clear AI policy. It is: “I don’t know what my team is doing with ChatGPT.”

That is not cynicism. It is the statistical reality. According to multiple 2025 enterprise security surveys reported by SecurityWeek, more than 80% of workers use unapproved AI tools, including nearly 90% of security professionals. Fifty-seven percent of employees in one 2025 survey reported actively hiding their AI usage from their employer.

The policy gap is not because operators are careless. AI adoption happened faster than governance could follow. Employees found tools that made them more productive, used them, and never thought to ask whether their employer had a preferred tier or a prohibited-data list. The absence of a policy felt like permission.

That absence has consequences. The data handling agreements you negotiated with your enterprise AI vendor do not cover what employees do on personal accounts. And by the time a breach happens, the data has already left the building.

Shadow AI Is the Real Threat, Not the Enterprise Platforms

Shadow AI is the term for employees using AI through channels their employer cannot see or control: a personal Gmail-linked ChatGPT account, a free Claude account, a Gemini session on their personal Google profile. IT has no visibility. DLP (data loss prevention) tooling cannot intercept it. Audit logs do not capture it.

The scale is larger than most operators assume. Forty-seven percent of employees access AI through personal or unmanaged accounts, and 45% of data prompts in corporate environments flow through those personal accounts. This is not a tail-end behavior pattern. It is how most AI use at work actually happens.

The IBM 2025 Cost of Data Breach Report put a number on it. The average cost of a shadow AI data breach is $4.63 million, compared to $3.96 million for a standard data breach. Shadow AI incidents now account for 20% of all breaches tracked. That $670,000 premium is entirely attributable to the visibility gap: when IT cannot see what is happening, they cannot respond until the damage is done.

The behavioral reality matters here. Employees using personal AI accounts are not, as a rule, trying to leak company data. They are trying to get their work done faster. The tool access gap and the policy gap are what push the behavior underground. Blame the absence of an approved path, not the employees who found their own.

Consumer Tier vs. Enterprise Tier: A Very Different Data Agreement

Consumer accounts and enterprise accounts are not the same product from a privacy standpoint. The difference is not cosmetic.

Consumer accounts (ChatGPT Free or Plus on a personal email, Claude.ai without a Work subscription, Gemini on a personal Google account) operate under consumer privacy policies. Your employees’ prompts may be used to improve future model versions unless they opt out, a setting many users never find. More importantly, corporate IT has zero visibility into what happens on these accounts. There is no audit log. There is no admin console. There are no controls.

Enterprise accounts change five things: (1) the vendor commits not to train on your organization’s data; (2) data retention is admin-configurable; (3) audit logs capture which employee sent which prompt and when; (4) access is tied to SSO, so only people with corporate credentials can use the approved tool; and (5) compliance certifications (SOC 2 Type II, BAA for HIPAA) cover the vendor relationship in a way that consumer accounts never do.

There is one distinction the enterprise “no training” commitment does not make clearly on its own: not training on your data is not the same as not retaining your data. Most enterprise tiers still hold prompts for up to 30 days for abuse monitoring. Zero Data Retention (ZDR) is a separate contractual arrangement, not a default toggle. It means the vendor stores no inputs or outputs after the response is returned. Operators in regulated industries, particularly healthcare and financial services, need ZDR explicitly negotiated into their contracts, not just assumed because they bought the enterprise tier.

For organizations with very high security requirements, some vendors also offer BYOC (Bring Your Own Cloud) deployment, where the model runs within your own cloud environment entirely. BYOC eliminates the shared-infrastructure question, though it comes with its own operational complexity and cost.

Understanding why the tier distinction matters starts with understanding [what an LLM actually is and how it processes your input]: every prompt travels to the vendor’s inference infrastructure, generates a response, and depending on your tier and contract, may sit on their servers for some period afterward. The tier determines what happens to it next.

How Each of the Four Major Vendors Handles Your Data

This section focuses on data privacy commitments across the four vendors. The full comparison of capabilities and positioning across the four major AI platforms (which model is best for which use case, how outputs compare, where each vendor sits in the market) is in a separate article.

	Training on Org Data	Retention	ZDR Available	BAA Available	Key Certs
OpenAI Enterprise	No (contractual)	30-day default, configurable	Yes (API, qualifying contracts)	Yes (Enterprise/API)	SOC 2 Type II
Claude Enterprise	No (contractual)	Admin-configurable	Negotiable	Yes (Enterprise/API)	SOC 2 Type II, ISO 27001
Google Workspace + Gemini	No	Admin-configurable, data residency available	Configurable	Yes	SOC 2, ISO 42001, FedRAMP High, BSI C5
xAI Grok Enterprise	No (committed)	Configurable	Vault (customer-controlled encryption)	Not publicly confirmed	Less documented

OpenAI (ChatGPT Enterprise / API): According to OpenAI’s enterprise privacy documentation, ChatGPT Enterprise does not train on organizational data. Deleted conversations are removed from OpenAI systems within 30 days unless legally required. ZDR is available for API users with qualifying contracts, meaning inputs and outputs are not stored after the response returns. Data is encrypted at rest (AES-256) and in transit (TLS 1.2+). OpenAI added EU data residency in 2025 for organizations with GDPR localization requirements. SOC 2 Type II certified.

Anthropic (Claude Team / Claude Enterprise): Anthropic does not train its models on data from Claude for Work, Claude Enterprise, or API accounts. Per Anthropic’s trust center documentation, this is built into the commercial terms of service as a contractual commitment, not an opt-out toggle. Claude Enterprise is SOC 2 Type II and ISO 27001 certified. Consumer-tier Claude.ai users have opt-out-of-training settings that shifted in 2025; enterprise tiers were explicitly excluded from those policy changes. BAA available for Enterprise and API customers.

Google (Workspace + Gemini): According to Google’s Workspace AI security documentation, Gemini for Workspace does not use enterprise data for model training and does not permit human review outside the organization. Cross-user data leakage is prevented by architectural isolation. As of mid-2025, admins can configure data residency to US, EU, or no preference. Certifications include ISO 42001 (the AI-specific management system standard), FedRAMP High, BSI C5, and SOC 2. BAA is available. Google also offers client-side encryption, meaning the customer holds the encryption keys rather than Google.

xAI (Grok Enterprise): xAI launched Grok Business and Grok Enterprise tiers in 2025 with a commitment that customer data is not used for model training. Grok Enterprise includes SSO, SCIM directory sync, and a Vault capability providing customer-controlled encryption. The compliance certification stack is less publicly documented than the other three vendors as of May 2026. If you are in a regulated industry, request current SOC 2 and ISO documentation from xAI directly before committing. This is not a knock against Grok; it is honest about where the documentation trail sits relative to the more established vendors.

What Never Goes Into ChatGPT (Or Any Consumer AI Tool)

Here is what employees actually paste into AI tools, ranked roughly by frequency of damage:

Customer PII: Names, contact details, account numbers, medical record numbers. Often pasted when drafting emails, analyzing customer issues, or building reports.
Source code and proprietary algorithms: The Samsung pattern (see below). Engineers reaching for a debug or optimization tool without thinking about where the code goes.
Internal financial data: Budgets, forecasts, M&A materials, board presentations. Pasted to ask the AI to format, summarize, or analyze.
HR and legal documents: Performance reviews, employment disputes, contracts under negotiation. Common when someone wants a draft or a rewrite.
Meeting transcripts and recordings: Especially as AI note-taking tools proliferate. The AI gets the full transcript, including everything said that was never meant to leave the room.
Strategic plans: Product roadmaps, competitive analyses, acquisition targets.

According to Cyberhaven’s research, the volume of corporate data going into AI tools increased 485% between March 2023 and March 2024. By March 2024, 27.4% of corporate data entering AI tools was classified as sensitive, up from 10.7% a year earlier. The behavior is not an edge case.

The Samsung 2023 incident is the canonical example of how this plays out. In April 2023, within less than 20 days of allowing ChatGPT access, Samsung engineers were responsible for three separate data exposure events:

An engineer pasted source code from a faulty semiconductor database into ChatGPT to debug it.
A second engineer entered proprietary manufacturing equipment code to get optimization suggestions.
A third employee submitted a confidential internal meeting transcript to have minutes generated.

Samsung’s immediate response was an emergency 1,024-byte prompt limit per session. The company then issued a company-wide ban on external generative AI tools and accelerated development of Samsung Gauss, an internal LLM designed to keep sensitive data inside controlled infrastructure. Similar restrictions followed at Apple, JPMorgan Chase, Verizon, and Amazon within weeks.

The Samsung lesson is not that AI tools are uniquely dangerous. It is that employees will reach for AI wherever it is useful, and absent an explicit list of what not to share, they default to sharing whatever is relevant to the task. Source code is relevant when you are debugging. A meeting transcript is relevant when you need minutes. The task logic is sound; the data handling awareness is not.

Prompt injection is a related risk that most AI privacy discussions skip. OWASP LLM01:2025 is the top critical vulnerability in the OWASP 2025 Top 10 for LLM Applications, present in over 73% of production AI deployments assessed. It describes an attack where malicious content embedded in a document an AI reads is crafted to override the AI’s instructions. An employee asks an AI to summarize a document; that document contains hidden instructions telling the AI to forward session data to an external endpoint. The employee sees a normal summary; their data goes somewhere else. Understanding [how prompt structure affects AI behavior] clarifies why this works: the model follows embedded instructions the same way it follows legitimate ones.

OpenAI has publicly stated that prompt injection “is unlikely to ever be fully solved.” For operators, the implication is that DLP controls need to account for what AI systems can be manipulated into leaking, not just what employees intentionally paste in.

What a Workable AI Policy Actually Looks Like

Most AI policies I see are press releases, not policies. They say “we are committed to responsible AI” and stop. Here is what I would actually put in a policy for a 20-person agency or services firm.

1. Approved tool list. Name the tools. Not “approved AI tools” as a category, but specific products at specific tiers: “ChatGPT Enterprise, accessed via corporate SSO. Claude for Work. Gemini via Google Workspace.” One person owns the list and updates it as vendor policies change. New tool requests go through that person before employees start using them.

2. Prohibited data categories. Write them in plain language: customer names and contact details, financial forecasts or budget data, contracts not yet signed, patient records or any health information about identifiable individuals, proprietary source code, and meeting transcripts containing any of the above. Not “sensitive PII.” Specific categories that any employee can identify without a law degree.

3. Required tier. The default is enterprise tier for any work-related use. Consumer accounts accessed via personal email for work tasks are prohibited. This single requirement closes more of the shadow AI gap than anything else on this list.

4. SSO requirement. All approved AI tools must be accessed through the corporate login. This makes personal-account workarounds detectable and creates the audit trail that compliance frameworks require.

5. Audit logging acknowledgment. Inform employees that AI tool usage is logged and may be reviewed. This functions as both a behavioral deterrent and a compliance artifact.

6. Training, not a one-time session. Tie it to onboarding and annual reviews. Focus on data categories (“here is what never goes into an external AI tool and why”), not fear. One training session is not enough.

One honest caveat: even a well-built policy does not eliminate shadow AI. People will still use personal accounts sometimes, especially on personal devices. The goal is reducing exposure and creating a clear baseline that makes deviation visible and addressable, not invisible and unmanaged.

A related concern operators sometimes conflate with policy decisions is [AI hallucination and why it matters for business use]: the tendency of LLMs to generate confident but inaccurate responses. Hallucination is a distinct risk from data leakage, but both belong in any AI acceptable use policy.

The enterprise tier costs more than the consumer tier. Whether that cost is worth it for your specific situation is part of [the full cost picture of deploying AI in a business], which I cover in the next cornerstone in this series.

How We Handle AI Privacy at AIM (And for Client Work)

At Alameda Internet Marketing, paid business-tier accounts are the default for any internal production work. Consumer accounts are not part of the workflow for anything that touches client data. That is not a moral position; it is a practical one.

Here is what we do not put into any external AI tool, regardless of tier: client names paired with financial data, client credentials or API keys, draft contracts or agreements not yet signed, and specific strategic plans that a client has not made public. The tier protects data at the infrastructure level; our own data handling discipline is the layer above that.

The question comes up in real client work regularly. When we are drafting content for a client, we give the AI what it needs for the task: the topic, the target keyword, the brand voice, the outline. We do not paste in client analytics data that reveals competitive positioning, client revenue figures, or anything that a reasonable person would consider confidential if they saw it in a prompt log. The audit logging that comes with enterprise-tier access means someone, including us, can see exactly what went in and what came out.

The broader point: having thought about this intentionally is itself a differentiator. Most agencies and service firms are running on a “nobody asked, nobody told” basis. When a client asks how we handle their data in AI workflows, we have a real answer. That is worth something.

Tools That Help: DLP, SSO, and Enterprise AI Controls

For operators who want technical controls in addition to policy, here is what the category looks like without a full enterprise security budget.

Nightfall AI is an AI-native DLP platform that monitors outbound data flows across SaaS apps, browsers, endpoints, and email. It uses AI-based detection with over 100 classifiers to identify sensitive data in prompts before they reach external AI services. Nightfall cites 95% detection accuracy, compared to 5-25% for legacy regex-based DLP tools. The platform added Nyx in 2025, an autonomous incident investigation capability that reduces the human triage load on every alert.

Wald.ai takes a different approach. It sits as a secure gateway between your employee and the AI provider. Wald’s smart redaction replaces sensitive data with structured placeholders before the prompt leaves the corporate network. The AI reasons on the redacted version. When the response returns, Wald re-populates the original values locally on the user’s screen. The AI provider never sees the underlying sensitive data.

SSO enforcement is the most practical control for small operators. Routing all AI tool access through corporate identity (Okta, Azure AD, or Google Workspace) creates a per-user, per-session audit trail and makes personal-account workarounds detectable. This does not require enterprise DLP. It requires that your AI tool subscriptions support SSO and that you actually configure it.

Understanding why prompt-level controls matter starts with [what an LLM actually is and how it processes your input]: every prompt is text that passes through the model’s inference process. Controlling what text enters that process is the entire intervention point for DLP in AI contexts.

Honest sizing for small operators: enterprise DLP is expensive and assumes a security team with bandwidth to manage it. For a 10-20 person firm, the practical minimum is SSO, enterprise-tier subscriptions for approved tools, and a clear prohibited-data list that employees have read and acknowledged. That combination covers most of the exposure without a six-figure security stack.

You are the data controller. The vendor’s SOC 2 certification does not make you compliant. Your decision about which tool to use and what data you put into it is your liability. That is the framing for everything below.

GDPR: If your employees or customers are in the EU, processing their personal data through a consumer AI tool with unknown retention policies may be an unauthorized transfer under GDPR Articles 5 and 28. You need a Data Processing Agreement (DPA) with any vendor handling EU personal data. Consumer tiers typically do not qualify for a DPA. Fines run up to 4% of global annual revenue.

HIPAA: Any AI tool that handles Protected Health Information (PHI) requires a signed Business Associate Agreement (BAA) with the vendor before data goes in. BAAs are available from OpenAI Enterprise, Google Workspace, and Anthropic Enterprise. Consumer tiers from all three vendors do not offer BAAs. If someone on your team pasted a patient record into consumer ChatGPT last week, that may already be a reportable incident. In January 2025, HHS proposed the first major HIPAA Security Rule update in 20 years, removing the distinction between required and addressable safeguards and raising the encryption bar in ways that directly affect AI tool deployment.

SOC 2 Type II: This is an independent audit confirming that a vendor’s security controls are operating effectively over time, not just documented on paper. It is what “SOC 2 certified” actually means. Anthropic (Claude Enterprise), OpenAI Enterprise, and Google Workspace all hold it. When you are evaluating whether an AI vendor can handle your data, SOC 2 Type II is the baseline credential to require.

CCPA: California businesses must confirm that AI tools processing California resident data are covered by appropriate data processing agreements and do not use that data for the vendor’s own purposes without consent. Consumer AI terms of service often permit broad data use that conflicts with CCPA requirements.

FINRA: Per FINRA’s 2026 Annual Regulatory Oversight Report, AI tools fall under the same supervisory obligations as any other business technology under FINRA Rule 3110. Financial services firms must include AI tool usage in their Written Supervisory Procedures.

Regulated industries (healthcare, financial services, any company with EU customers) need enterprise-tier AI with signed DPAs or BAAs. “We use ChatGPT” without specifying which tier and whether the compliance paperwork is in place is not a sufficient answer if a regulator asks.

FAQ: AI Privacy at Work

Q: Can I use ChatGPT at work?

A: It depends on which tier and what data you are pasting into it. A personal-Gmail ChatGPT login is governed by consumer terms that your company never signed and that your IT team has no visibility into. Enterprise ChatGPT changes the data handling terms: no training on org data by default, audit logs, and admin control over retention. Most companies have not formally decided which tier to require, which means employees are defaulting to whatever account they already have. The decision your company needs to make is which tools are approved, at which tier, for which use cases. Once that is decided and communicated, “can I use ChatGPT” has a clear answer instead of a grey one.

Q: Is ChatGPT GDPR compliant?

A: ChatGPT Enterprise can be GDPR-compliant when a proper Data Processing Agreement is in place and EU data residency is configured. Consumer-tier ChatGPT is generally not a valid processor for EU personal data: the consumer terms do not include the contractual protections GDPR requires under Articles 5 and 28. The tier matters for compliance, not just security. If your business handles EU personal data and employees are using consumer accounts, that is an active compliance gap, not a theoretical one.

Q: What is shadow AI?

A: Shadow AI is when employees use personal or unapproved AI accounts for work tasks, for example, using a personal Gmail-linked ChatGPT account instead of a company-managed one. IT cannot see these sessions. DLP cannot intercept them. Audit logs do not capture them. The data handling agreement you negotiated with your enterprise vendor does not apply. For the scale data and cost figures, see the shadow AI section above.

Q: Should I block ChatGPT at work?

A: Probably not. Banning tends to push usage underground rather than eliminating it. Research consistently shows that employees who want to use AI productively find a way to do it regardless of block-list policies; they just use personal devices or personal accounts where you cannot see them at all. The more effective approach is providing an approved enterprise-tier path, a clear list of what data cannot go into any external AI tool, and making the approved option easy to access through SSO. Employees who have a sanctioned tool that works will use it. Employees who hit a block page will find a workaround.

Q: Is the API safer than the ChatGPT website?

A: From a data handling standpoint, yes, for users with the right setup. The OpenAI API includes Zero Data Retention (ZDR) as a negotiable contract term, meaning inputs and outputs are not stored after the response returns. Consumer ChatGPT on a personal account does not have this option. The catch is that API access assumes a technical integration, not a standard employee workflow. A developer building a tool on the API can configure ZDR. An employee opening a browser tab to ChatGPT.com on their personal Gmail cannot. ZDR is a contract negotiation, not a product toggle, and it applies at the API level.

Q: What data should never go into any AI tool?

A: The short list: patient records or any health information about identifiable individuals, customer PII (names, contact details, account numbers), unreleased financial data, source code for proprietary systems, legal documents under attorney-client privilege, and meeting transcripts containing any of the above. The Samsung 2023 incidents covered source code, manufacturing code, and a confidential meeting transcript, all in less than 20 days. For the full breakdown and context, see the data categories section above. If you are uncertain whether a specific piece of data belongs on this list, the working rule is: if you would not hand it to a stranger on the street, do not paste it into a consumer AI tool.

If you are working through AI tool decisions at your company and want a practitioner’s take on which tier to require, what your policy should say, and how to handle client data in AI workflows, this is what we work through with clients at Alameda Internet Marketing. Contact us to talk through your specific situation.