When a system implicitly trusts content it shouldn't, attackers find leverage. A recent vulnerability in OpenAI's ChatGPT web interface—codenamed ChatGPhish by Permiso Security—illustrates this principle at scale, and hints at a risk pattern that extends well beyond a single AI tool.

How Markdown Becomes an Attack Surface

ChatGPT's web response renderer processes Markdown, including links and embedded images. Because the renderer operates within a trusted context—the user's authenticated session with OpenAI—it does not apply the same defensive parsing that would guard against prompt injection. An attacker can craft a Markdown link or image tag in a malicious webpage or document, then lure a user to paste its content into ChatGPT. When the AI parses the Markdown, the link or image triggers instructions hidden in its metadata, redirecting the model's behaviour without explicit user awareness.

The attack exploits a fundamental misalignment: Markdown is meant to format human-readable content, not execute logic. But when a parser treats Markdown directives as trustworthy instructions, it becomes an instruction channel. The vulnerability essentially allows an attacker to inject new prompts into an active conversation, potentially steering the model to divulge sensitive information, change its operating mode, or generate misleading content that a user then acts upon.

The Broader Trust Problem in API and AI Integration

This class of vulnerability mirrors patterns engineers encounter when integrating third-party APIs, embedding user-generated content, or deploying AI models in production systems. The root cause is rarely the technology itself; it's the assumption that input from a certain source is inherently safe.

In infrastructure contexts, similar issues arise with:

The ChatGPhish case is instructive because it highlights how even systems designed with security in mind can falter when a renderer or parser makes a simplifying assumption: that a particular syntax (Markdown) coming from a particular context (the model's own output) carries no hostile intent.

Detection and Mitigation Strategies

From an operator's standpoint, defending against prompt injection requires treating user input—including content pasted into AI interfaces—as untrusted, regardless of where it appears to originate. For teams using AI tools as part of their workflow, that means:

For infrastructure teams integrating AI or large language models into internal tools, the lesson is to apply the same input validation rigour as you would to any other API boundary. Do not assume that because a system is AI-powered, it is immune to injection. Tokenise, validate, and sanitise all input, and ensure the model itself cannot be steered by markers or syntax hidden in its training data or context window.

A Reminder About System Assumptions

ChatGPhish is a good example of why security design requires questioning implicit trust. When you assume that a certain class of input is safe because it appears in a trusted context, you create a blind spot. The vulnerability wasn't a code bug in the traditional sense; it was an architectural assumption that went unchallenged until researchers tested it.

As AI tools become more embedded in workflows—from internal documentation systems to infrastructure automation—this pattern will repeat unless teams remain vigilant about where trust boundaries actually lie. A renderer that processes Markdown should validate that Markdown before acting on it. An AI model asked to summarise untrusted content should never use that content as a source of truth about its own instructions. These aren't novel principles, but they bear repeating, because the technology changes faster than our instincts about where to draw the line.