Louis Zezeran
26. veebr. 2026
At NEVERHACK Estonia’s Client Day 2026 in Tallinn, the conversations weren’t just about new tools—they were about what’s changing right now in the security landscape. In this Cybercast episode, Louis Zezeran sits down with Jonne Tuomela, Senior Solutions Engineer at Netskope, to break down a topic that’s quickly moved from “future concern” to immediate operational reality: how organizations can safely use large language models without exposing themselves to data leakage, reputational risk, compliance headaches—or outright security incidents.
This is not a theoretical discussion. It’s about the kinds of failures we’re already seeing: prompt injection, jailbreaks, hallucinations, and poisoned inputs that turn an AI assistant from helpful to harmful. And importantly, it’s also about the real-world constraints every security leader recognizes: people want to use these tools, business teams see value, and “just block it” rarely works for long.
Why this episode matters now
Many companies are currently in an awkward middle stage with AI adoption:
- AI tools are widely available.
- Employees are experimenting (often without clear policy).
- Leadership wants productivity benefits.
- Security teams worry about what’s being shared, what’s being generated, and what the company may be liable for.
Jonne’s perspective is valuable because it acknowledges a simple truth: AI risk isn’t one single risk. It’s a bundle of different failure modes—technical, human, and organizational—and you need layered responses to match.
Prompt injection and the “guardrails illusion”
The conversation opens with the classic “prompt injection” problem: you can manipulate an LLM into doing something it shouldn’t, often by reframing the request as harmless, emotional, or fictional. Louis highlights how public assistants have guardrails: ask for instructions to do something illegal, and you’ll typically get a refusal. But then comes the key question: why would attackers rely on public LLMs at all? Why not run a private model where guardrails can be removed?
Jonne confirms the reality: yes, it’s possible, and it’s not even particularly exotic. Models can be run locally, tuned to specific purposes, and operated without the constraints that consumer platforms must apply. The important detail is that “big and broad” isn’t always necessary—if an attacker only needs expertise in a narrow domain (like exploit development or malicious coding), a specialized model can be “good enough” without having world knowledge.
For defenders, the takeaway is clear: you can’t base your security strategy on the assumption that attackers face the same restrictions your employees do. Public guardrails may reduce risk for everyday misuse, but they do not eliminate the threat landscape.
Are defenders at a disadvantage?
Louis pushes the conversation into an uncomfortable—but honest—area: if mainstream LLMs refuse to provide exploit details, does that limit defenders’ ability to learn and respond, while attackers run unrestricted tools?
Jonne’s answer is nuanced. Yes, consumer-grade assistants may not provide “full information” about a vulnerability or exploit chain. But defenders aren’t limited to consumer tools. Security vendors and security-focused AI systems can operate under different constraints, and organizations can use models that are tuned for analysis rather than general conversation. In the end, the underlying dynamic remains familiar: security is a chase, with attackers moving first and defenders responding.
What AI changes is the ability to be more proactive—especially through systematic testing.
AI red teaming: proactive testing at scale
One of the most actionable parts of the episode is Jonne’s description of AI red teaming. The concept is straightforward: you test an AI system by attacking it—deliberately and repeatedly—using a large collection of prompts and techniques designed to elicit unsafe behavior. That includes:
- attempts to extract sensitive data,
- attempts to bypass policies or restrictions,
- attempts to generate harmful or illegal outputs,
- attempts to produce discriminatory or biased responses,
- attempts to induce misinformation.
Jonne notes a meaningful scale here—he references 15,000 different prompts used for testing. The implication is important: one or two tests are not enough, because language is flexible. A “blocked” question can be rephrased in endless ways. To learn how a model behaves in practice, you need variety, volume, and repetition.
Corporate AI risk is more than “hacking”
A particularly strong thread in this conversation is that corporate AI risk isn’t only about criminals. Jonne gives examples that many organizations overlook until it’s too late:
- If a corporate AI system produces sexual content or harassment-like responses, that can become an HR incident and reputational problem.
- If it produces biased outputs—like unfairly ranking employees or recommending decisions influenced by societal stereotypes—that can create real discrimination risk.
- If it spreads misinformation, even unintentionally, it can degrade decision-making and credibility.
In other words, the threat model includes not just “malicious external attacker,” but also internal misuse, brand risk, and organizational harm.
The hard truth: you can’t guarantee perfection
Louis brings up a problem that every security-minded leader eventually asks: how do you prove an LLM is safe? If models are non-deterministic, and their outputs can vary, testing 1,000 times doesn’t guarantee safety on the 1,001st run.
Jonne offers a pragmatic principle: the enemy of good is perfect. If you demand a tool that never fails, you’ll end up stuck—because that’s not how these systems work today. Instead, organizations need to decide what level of residual risk is acceptable. Is one unsafe output in 1,000 acceptable? Or do you need one in 10,000? This is not purely technical—it’s governance.
This is also where Jonne introduces an important framing: not all “wrong outputs” are equal. Some errors are harmless (like slightly incorrect latency estimates). Others are harmful—like malicious code, criminal guidance, or discriminatory recommendations. The key is prioritization: focus control effort on the failure modes that matter most.
Poisoned training data and “malicious helpfulness”
One of the most security-relevant insights is the risk of poisoned training material. Jonne describes a scenario where an attacker injects malicious code patterns into training data, so the AI produces output that appears legitimate but includes harmful behavior. Imagine asking for help building an API integration—and receiving code that quietly adds a backdoor or risky behavior.
This is a reminder that AI risk isn’t only “what users ask.” It’s also “what the model replies”—and whether the organization has a way to detect and block harmful responses.
Guardrails as a security layer, not a moral feature
Jonne describes guardrails as a layer between the user and the AI—inspecting what goes in and what comes out. Louis reflects on this as a “corporate rule layer,” and Jonne connects it to the broader security principle of defense in depth: firewalls, EDR, email security—multiple layers, because any single layer can fail.
The guardrails concept becomes even more practical when the discussion turns to policy choices. Not every organization wants to block everything. Many companies want to maintain usability and employee freedom while blocking “the really bad stuff”—criminal content, malware, sensitive data exfiltration. This is a crucial point: overly strict controls often backfire, pushing people toward workarounds like VPNs, proxies, or shadow IT.
A pragmatic policy example: allow prompts, block uploads
A standout, immediately usable idea appears in the episode: if you’re worried about sensitive data leakage, you might allow employees to use AI tools for prompting, but block file uploads into those services. That addresses one of the highest-risk behaviors (sending proprietary documents into a third-party system) while still allowing productivity workflows that don’t require attachments.
This aligns with what many organizations actually need: control the major risk paths without crushing the user experience.
Coaching and education: security that scales
Finally, Jonne makes a point that experienced security teams recognize but companies often underinvest in: the best defense is knowledge. Instead of only blocking behavior, you can add a “coaching” step—if someone tries to upload a document containing PII, prompt them to justify the action and explain why it’s risky.
Jonne shares an example of a document that contains personal identifiers (a “proof-of-life” document). Even if it’s your own information, uploading it into the wrong place can be dangerous. The coaching approach helps employees learn what counts as sensitive data, why it matters, and how to make safer decisions—without turning security into an adversarial relationship.
Key takeaways you can apply immediately
If you’re responsible for security, compliance, IT, or even just enabling safe AI usage in your company, this episode offers a practical checklist:
- Assume attackers can run unrestricted models. Don’t rely on public guardrails as your threat model.
- Red team your AI systems at scale. One test doesn’t reflect real risk; volume and variety do.
- Treat AI as a layered security problem. Inspect prompts and responses. Add controls around data movement.
- Decide what “good enough” means for your risk appetite. You won’t get perfect; you can get measurable improvement.
- Don’t underestimate HR and reputational risk. Bias, harassment, and misinformation can be as damaging as technical compromise.
- Coach, don’t just block. Education scales, and it reduces the incentive to bypass controls.
Call to action
If your organization is exploring corporate AI—whether that means internal chatbots, HR assistants, developer copilots, or simply allowing employees to use public LLMs—this episode will help you think clearly about risk without falling into extremes. You’ll walk away with a better understanding of prompt injection, why red teaming matters, how guardrails work in practice, and how to strike the right balance between usability and safety.
Listen now, connect with NEVERHACK on LinkedIn. Subscribe for more practical conversations on modern security challenges—from real practitioners and partners working in the field.