Shadow AI in Pentesting: What Happens When Your Tester Uses ChatGPT With Client Findings
If you lead a pentest team of 5 to 15 people, at least one of them is using ChatGPT to help with reporting. They're pasting finding descriptions, asking for remediation language, requesting CVSS scoring help. They're not being reckless. They're being efficient, using the best tool available to them in the moment.
The problem is where those prompts go. Every finding your tester pastes into ChatGPT leaves your infrastructure and lands on OpenAI's servers. That's your client's attack surface, on someone else's hardware, processed under terms your client never agreed to.
Key Takeaways
- Pentesters using ChatGPT for reporting assistance are sending client vulnerability data to OpenAI's infrastructure, often without realizing the data handling implications.
- ChatGPT's free and Plus tiers use conversations for model training by default; even with opt-out, data is processed and stored on external infrastructure you don't control.
- Most pentest engagement NDAs include clauses about handling sensitive information that likely prohibit routing findings through third-party AI services.
- Samsung banned ChatGPT internally in 2023 after engineers leaked proprietary source code through prompts within weeks of gaining access. The mechanism is identical to what pentesters do with findings.
- Enterprise AI plans (ChatGPT Team, Microsoft Copilot) offer better data protections but still route data through external infrastructure, require enterprise contracts, and break the chain if one person uses a personal account.
- The structural alternative is running AI locally: an open-source model on your hardware, where prompts never leave the environment where the findings already live.
Five Ways Your Team Is Sending Client Data to External AI
These aren't hypothetical. They're the specific workflows where pentesters reach for ChatGPT because it genuinely helps with reporting work.
1. Writing finding descriptions. A tester pastes raw scanner output plus vulnerability context and asks AI to draft a finding description. The prompt contains the vulnerability type, affected systems, and often the client's internal network architecture.
2. Drafting executive summaries. Executive summaries require client context: company name, systems tested, risk posture, business impact. A tester asking AI for help with this is sending a compressed version of the engagement scope.
3. Suggesting remediation steps. Useful remediation is specific to the system and environment. The prompt contains finding details, technology stack, and deployment context.
4. Scoring severity. CVSS scoring with context means sharing vulnerability details and system information so the AI can weigh exploitability and impact correctly.
5. Cleaning up report language. Polishing grammar and phrasing means pasting full report sections, including findings, evidence, and client-specific context.
Each of these is a legitimate productivity use case. The problem is not that your team is doing them. The problem is the data pipeline they're using to do them.
Where the Data Actually Goes
ChatGPT's data handling depends on the plan tier, and the differences matter.
Free and Plus tiers (most individual users): Conversations are used to train future models by default. You can opt out, but even with opt-out, data is transmitted to and processed on OpenAI's infrastructure. Your client's vulnerability data is stored on servers governed by OpenAI's terms, not your engagement NDA.
Team and Enterprise tiers: Zero data retention, excluded from training. This is meaningfully better. But it requires an enterprise contract, IT-managed deployment, and consistent team adoption. One tester using a personal Plus account breaks the entire chain.
The gap between "I pay for ChatGPT" and "my client's data is protected" is wider than most practitioners realize. Paying for Plus is not the same as having data protection. The architecture is the same: data leaves your environment.
Samsung Banned ChatGPT in Three Weeks
In 2023, engineers at Samsung's semiconductor division leaked proprietary source code and internal meeting notes through ChatGPT prompts within weeks of being given access to the tool. They weren't acting maliciously. They were doing what anyone does with a useful tool: using it to get work done faster.
Samsung banned generative AI tools internally within a month.
The mechanism was identical to what happens when a pentester pastes findings into ChatGPT. A person with access to sensitive information uses an external AI tool because it's the fastest way to accomplish a legitimate task. The data leaves the organization. The organization had no visibility into it happening.
The difference for pentesting is that the data involved is worse. Samsung's engineers leaked their own company's source code. Your tester is leaking someone else's attack surface.
The NDA Problem
Most pentest engagement agreements include specific clauses about handling sensitive security information. The report your team produces is, by design, a document that describes how to exploit your client's systems. Its contents are confidential not because of corporate policy but because disclosure creates direct security risk.
When a tester pastes finding details into ChatGPT, those details are processed by a third-party service that:
- Your client never approved as a subprocessor
- Operates under its own data handling terms, not your engagement contract
- May retain data for periods your NDA doesn't contemplate
- Is headquartered in a jurisdiction your client may not have agreed to for data processing
Whether this constitutes a technical breach of your NDA depends on the specific agreement. Whether it would survive a client audit is a different question. "We sent your vulnerability data to OpenAI's servers, but we opted out of training" is not an answer that builds long-term client trust.
Why "Just Use the Enterprise Plan" Doesn't Solve It
Enterprise ChatGPT and Microsoft Copilot for Security offer stronger data protections: zero retention, no training, enterprise agreements. They're a real improvement over personal accounts.
But they don't address the structural problem for pentest teams.
Cost and access. Enterprise AI plans require minimum seat commitments and enterprise contracts. For a boutique consultancy of 5 to 10 people, the overhead of an enterprise AI contract may not make sense.
Consistency. Policy enforcement requires every person on the team using the approved plan. One contractor on a personal account, one tester who forgot to switch accounts, one weekend engagement where someone reaches for the free tier. The chain is only as strong as its weakest link.
Architecture. Even with enterprise terms, data still leaves your infrastructure. You're routing client findings through an external vendor's cloud. For teams with government clients, defense contractors, or data residency requirements, the hosting model itself is the problem, not the contract terms.
This applies if: Your team does regulated work, has clients with data residency requirements, or operates under NDAs that restrict how assessment data is handled.
This concern is less urgent if: Your clients have no data residency constraints and your engagement terms don't restrict third-party data processing. Enterprise AI plans may be sufficient.
The Structural Alternative: Keep AI Where the Findings Live
The answer is not blocking AI. Your team uses AI because it makes reporting better and faster. Take away the tool and they'll either work slower or find another workaround.
The answer is changing the architecture. Instead of sending findings to an external AI, run the AI model on the same infrastructure where the findings already live.
A local LLM running via Ollama processes prompts entirely on your hardware. No external API call. No data transmission. No vendor to trust or distrust. The model runs on your machine, your prompts stay on your machine, and the output stays on your machine.
Dradis Echo is built on this architecture. It provides AI-assisted reporting, including finding descriptions, executive summary drafts, and remediation suggestions, through a local Ollama instance with scoped permissions. No external API, no data leaving your network. Because Dradis is open-source, you can inspect exactly what Echo does and doesn't transmit. There's no trust-me involved: read the code, verify the network calls, confirm it yourself. Your tester gets the productivity benefit. Your client's data stays where it belongs.
This is not an abstract architectural preference. It's the only approach where the data handling answer to a client audit is "it never left our infrastructure."
What to Do This Week
- Ask your team directly: "Who's using ChatGPT or another AI tool for reporting?" Assume the answer is "most of us" and work from there.
- Review your engagement NDAs for clauses on third-party data processing and subprocessors. Check whether using an external AI service is explicitly covered.
- If you allow external AI, mandate an enterprise-tier plan and enforce it through team policy. Acknowledge this doesn't address the infrastructure question.
- Evaluate local AI alternatives. Dradis Echo runs AI reporting assistance via Ollama, entirely on your hardware, with no external API dependency.
- Write a one-page AI usage policy for your team. It doesn't need to be perfect. It needs to exist.
Related Content
Data Sovereignty - Dradis Echo: AI-Assisted Pentest Reporting Without Sending Data to the Cloud -
Reporting - Pentest Reporting with Dradis - Best Pentest Report Generators Compared
Platform - Why Choose Dradis? - Dradis Framework: Industry Validation
FAQ
Is it actually a problem if my tester uses ChatGPT Plus with the opt-out enabled?
Opting out of training prevents your data from being used to improve future models. It does not prevent your data from being transmitted to and processed on OpenAI's servers. The data still leaves your infrastructure, is still governed by OpenAI's terms of service rather than your engagement NDA, and is still stored for a period defined by OpenAI's retention policy. For most client engagements, the question is not just "is it used for training" but "where is it processed and stored."
What's the difference between this and any other shadow IT problem?
Scale and sensitivity. A tester using an unapproved project management tool is a policy issue. A tester sending a client's complete vulnerability inventory through an external AI service is a confidentiality issue. Pentest findings are not general business data. They're exploitation guides. The consequences of exposure are immediate and specific.
Does Dradis Echo require an internet connection?
No. Echo runs a local LLM via Ollama on your hardware. After the initial model download, it operates entirely offline. No external API calls, no data transmission, no dependency on external services. It works in fully air-gapped environments.
Can't I just tell my team not to use AI?
You can. Samsung tried that too, after the incident. The problem is that AI assistance for reporting is genuinely useful. A policy that says "don't use AI" competes with a tool that dramatically cuts the time spent writing finding descriptions. The sustainable answer is providing a sanctioned alternative that offers the same productivity benefit without the data exposure. That means local AI.
What if my client asks how we handle AI in our workflow?
This question is becoming more common. Teams using external AI can answer "we have a policy and use enterprise-tier plans." Teams using local AI can answer "our AI runs on the same infrastructure as the rest of the engagement. Your data never leaves our environment." The second answer is simpler to audit and harder to poke holes in.
Your tester's AI shortcut is your client's data exposure. The fix isn't banning AI. It's changing where it runs. Try Dradis Community Edition and see how local-first reporting works.