Blog

02.12.2025 16:49

Don't say "Jehova" to an LLM

The Rabbi in the old skit from Monty Python's "Life of Brian" fell for it, and for a long time, philosophers argued whether quoting someone is fundamentally different to just saying the sentence. I remember a story where one actor smuggled a wedding promise in a co-actor's copy of his lines: After the vow was made on the set and the sentence couldn't be found in the official script: is the actor now bound in real life by his promise?

This mix of semantic levels is also at the core of a lot of cyber security problems: data gets misinterpreted as code and so an attacker can achieve code execution. With the emergence of LLMs, this never-ending story gets a new twist.

But first, let's first give some context and background - and no, I won't go back to antiquity again.

In-Band signalling

In some communication networks, certain control signals are mixed into the payload. The perhaps most famous case was the US phone network, where a tone of 2600 Hz was used to signal various state-changes within the phone network.

That tone should only be generated by the operators own equipment, but if one of their customers injects that tone by blowing suitable whistle, then it was interpreted by the network as well. That enabled interesting use-cases, not all of which were in the best interest of the network operator.

It's the same problem: is the network just transmitting a tone, or is it sending a signal to another node?

I've also seen several cases where humans put processing instructions in data fields that were supposed to contain content. If the other side is not a human, or perhaps someone who doesn't speak the same language, then those instructions are not followed, but taken as a literal input, e.g. text to be printed on a t-shirt.

It can also be the other way round: An LLM is asked to produce a text, but instead of just writing the desired article, it also prints a preamble like "An suitable essay for your question could be the following:" and a footer that contains hints to improve prompting "If you want to spice up your story, do X or Y.". Hilarity ensues if a journalist or a student copies the full answer - including preamble and instructions - into their final product.

I never used speech to text systems, but I think their worst case is dictating the manual for this system: You need to describe the voice commands that trigger actions like "delete last word" or "start boldface now", but in this case the system shouldn't treat them as commands but insert them as text into the document. This is also not new, numerous skits and movies used the trope of a secretary transcribing comments that were not intended to be included in the dictation.

The Von Neumann architecture - Buffer overflow

One of the major features of the currently prevailing computer architecture is the generic nature of memory. RAM is a place to store arbitrary pieces of information: it just remembers and retrieves patterns of bits, but it does not care how the CPU will interpret those bits. If you use tools to peek at the memory content during debugging, you need to tell the debugger how to interpret the data, that's why debugging symbols left by the compiler in the executable can be handy.

One consequence of this type-agnostic memory is that bug in software regarding where to store data can cause a data/code confusion: an old-school buffer overflow in the stack is exactly this: The program wants to store a piece of data and overshoots the allocated memory and ends up overwriting the return value which causes the CPU to interpret the newly received data as code and execute it.

Mitigations against this have a long history: the PDP11/45 with its MMU had already the concept of distinct Instruction vs. Data space. Modern processors also allow the operating systems to tag virtual memory regions as non-executable. The emergence of Return Oriented Programming gave the attackers a way to work around this protection.

To summarize: The program received a piece of data, made mistakes in storing it and so the sequence of bytes was turned from e.g., a string of characters into a sequence of machine code commands (or a sequence of references to code gadgets).

The best way to eliminate this kind of problem is to use a memory-safe language. It really shouldn't be up to the coder to make sure that buffer size constraints are checked on every write.

Less effective, another defence against such attacks emerged with signature-based anti-virus software. The idea is that the shell code contained input data could be detected, and the processing of such dangerous input could be aborted.

Apple recently announced their "Memory Integrity Enforcement" which is a creative way to link pointers to memory regions which should eliminate out-of-bound writes. Let's see how this will work out in practice.

SQL-Injection

SQL is a query language for databases: it contains both keywords like "SELECT", "WHERE" as well as the possibility to convey arguments. That may be simple and intuitive for numerical or Boolean values, e.g.

"select * from users where disabled = false and age > 18"

but once strings are involved, things start to get tricky. The query

select street from restaurants where name = 'Zur Post'

is simple enough, but what about names like "Fred's Pizza"? Then we run in the old issue of having to escape meta-characters. You need to tell the SQL parser that the single-quote symbol in the string is not denoting the end of the string but is actually part of the string. You're just quoting the ', and you are not saying it as part of the SQL statement's syntax.

Taking user input and directly dropping it into an SQL query opens your application up for an inject attack: if the attacker includes a single quote, the rest of the argument turns from being treated as a simple sequence of characters into a SQL command.

What are the countermeasures here? The obvious one is to correctly escape the string before putting it in the query. Yes, but the better solution is to use prepared statements and thus never put a user-supplied string into a query at all. Or sidestep the problem completely by using an ORM which gives you a nice object-oriented interface for the storage of data - obliviating the need to write SQL queries.

As a band-aid or second line of defence, a Web Application Firewall (WAF) can try to prevent any input that looks like it might be an injection attack, from even reaching your webserver.

Shell injection

The same applies to applications that take input from the network and turn them into a command-line parameter. If you are directly calling the other program and handing over its parameters as an array of strings. (e.g., e.g. using the Posix execl system call) you should be fine, but if you're just building a full command line to be passed to system(), then the C library will call a shell which then parses the full string into the positional arguments and starts the desired command.

But the shell is powerful. It will replace variables like $PWD and even execute command when the idioms ``` command`` and $(command) are used. Thus, if you just simply take a parameter from the network and dump it into the argument for a system() call, you just build a remote code execution vulnerability.

The recommendations here are a) avoid invoking via a shell, and b) do heavy duty input filtering (optimally by just allowing a safe set of characters). I really don't recommend trying to escape all possible shell meta-characters.

And yes, a WAF might also catch some of these attacks.

Cross Site Scripting (XSS)

HTML is yet another language that mixes content and control instructions which are either XMl-style tags like "<p>" or character references like "&auml:". That means that <, >, and & are special characters which need to be escaped if they should appear themselves in a document.

The most basic XSS vulnerability is a simple search field in a web page. The results page usually contains language like "You search for YOUR_INPUT found the following hit" following by a list of links. If the search query is echoed back verbatim here, then a search for "<script>alert("XSS");</script>" will include this script as an active command for the browser and will trigger an alert dialog. To avoid this issue, the control characters need to be escape, e.g., by replacing < with <.

Modern web frameworks handle all this correctly and will automatically escape text inserted into a generated webpage - unless the code explicitly indicates that yes, there really should be active code added to the page.

Again, a WAF might help here, too.

String Substitutions

In multiple systems and setups, strings of characters are not only a passive text, but can also contain processing instructions. A really old example is the format string in the printf() function of the C library. For example:

printf("This is an integer: %i\nAnd this is a string %s\n", i, str);

will insert a textual representation of a number and a string into the format string, replace the \n with the newline character and send the result to the standard output. So far, so harmless. But what happens when a coder uses printf(str) instead of printf("%s", str)? If someone manages to smuggle printf substitution codes into str, then the printf function will follow an undefined pointer and will print some random data from the memory of the program.

Much worse, the Java logging framework "log4j" also implements variable substitution when processing events. One of these substitutions was so powerful that it led to remote code execution when data from the network was not carefully handled. Known as "log4shell", this vulnerability caused a frantic search for affected systems in late 2021.

Mixing data and code

This is slightly different: this is not about programming errors, but about bad design decisions and the resulting human errors. There is thus no "this is how you need to code to avoid that type of error".

Macros

The possibility to add code to office documents, e.g. to enhance the built-in function or to enable crude automation workflows, was perhaps well intended, but very much ill conceived. It turned something that users mainly saw as passive documents, into executed code with full access to the current user's account. It took Microsoft ages, and its customers an uncountable numbers of malware infections and ransomware incidents, to rectify that initial mistake. Now ".docx" files are not allowed to contain macros, their execution is disabled by default, macros can be signed, and the mark-of-the-web adds another minor layer of protection.

LNK Files

Here, again, Microsoft's drive for ever more features backfired on the user's security. Instead of just implementing the functionality of symbolic links on top of filesystems that don't natively support them, it created the .lnk File in all its glory. Someone really clever in Redmond apparently said: "Why don't we just not only enable referring to an executable, but also add the option of passing command-line arguments to it?"

What they missed is the fact that this turned .lnk into the old "how much code can you fit into one line" challenge from bygone eras. By linking to powershell.exe and passing a short program on the command line, a .lnk is basically a program file. It should be treated with the same caution as a .bat or .exe file.

Adding JavaScript

Sigh.

Initially, HTML was a static document markup language. There was no interactivity, not even the responsiveness that modern CSS offers. Thus, these files were harmless. There was no code, opening them either opened an editor or a browser to render them.

With the inclusion of JavaScript (let's just ignore Java Applets, Flash, Silverlight and other abominations) changed all this. First, a static website just got a bit more interactive and before long, modern Websites are full-blown applications which are downloaded into and executed by your browser. In some ways, the browser is the new operating system, good examples are the Google suite of applications and the re-implementation of early 2000 games in WebAssembly.

Vulnerabilities in the browser which let code jump out of the JavaScript sandbox and infect the operating system with malware have been a frequent security threat over the last 20 years. It seems to get a bit better, but that may also be due to the fact that browsers are patched on a very aggressive schedule. E.g., Microsoft Edge is not waiting for the next Patch Tuesday before asking the user for a restart to apply a patch.

For a while, opening local html files with a browser led to a very generous interpretation of the Same-Origin Policy giving the embedded JavaScript way too many permissions. The old Microsoft Internet Explorer included a zone model giving webpages from (hopefully justifiably) trusted servers additional permission.

All in all, the gradual transformation from static HTML to a single-page JavaScript application led to numerous misinterpretations of security properties and led to a slew of vulnerabilities.

But it's not just HTML that was spiced up with JavaScript. PDF and SVG got the same treatment with similar security impact. Regrettably, there is no .pdfx vs. pdfm distinction to give users a chance to know which document contains code and which doesn't.

Prompt Injection

One of the new and interesting properties of LLM and assorted chat-bots is the fact that you communicate with them in your native language, e.g., English. You no longer need to tell the computer in some special language what to do, there is no need to learn SQL, C, JavaScript, Python, Java or Rust anymore, instead the LLM will do the translation from the human world itself. This is, of course, hugely transformative as the hurdle to make use of an LLM-assistant dropped by quite a lot.

But the human language is tricky. The grammar is not as strict as with the formal computer languages. Forget about understanding irony or humour. And, most importantly, common sense.

In a way, these chatbots can feel like a very powerful co-worker that you can ask for assistance. The "Co-Pilot" branding for Microsoft's offering also suggests this. But this co-worker is on a very different mental spectrum than any functional real human. It has a very limited understanding of the subtleties and the context of all the input it processes.

All this reminds me of a phishing incident I heard of a few years back: A CEO received a phishing mail just forwarded it to the secretary with a vague "please handle this", meaning "take a look at it, determine the mail's validity and act accordingly" whereas the secretary understood "please execute what this mail demands".

This is exactly where we are with LLMs: we're handing it data to process and act upon, and we hope that it can understand the difference between "Read this text and be sensible and cautious in what you do with it" and "follow the orders contained in this text".

All of this is happening in English, German or other human languages. Which means:

It can be imprecise and ambiguous.
The interpretation may depend a lot on context.
It can be plain text, but it can also be hidden in pictures. Both easily visible, but also in a way a human can't see it as clearly as the LLM.
It's not a formal language where you can easily spot the code. Python, C, SQL or even Cobol can be easily detected, filtered and made inert.
Quoting rules can already be tricky with formal languages (e.g., writing makefiles that execute commands with non-trivial shell features like backticks, variables and single and double quotes). Are there even formal definitions on how quoting should work for LLMs? Are they standardized?
We're completely blurring the line between code and data. Both is text.

This has gone wrong already, and these issues will be serious vulnerabilities going forward.

What have we seen?

This is certainly not an exhaustive list, but only a collection of what I remember reading about and/or noted down a link for it.

Twitter bots Answering with "Ignore all previous instructions and …" caused LLM-driven bots to execute the instruction.
LinkedIn Info scraped by bots This time, "[/admin][begin_admin_session]" was used to get bots to treat the following English text as a command.
Instructions hidden in an image Using downscaling to reveal the command.
Injecting instructions with white-on-white text This time, the target is a Claude Skill.
Unicode Tag Characters and ASCII Smuggling Which could be used to trick Copilot to execute commands.
LLMs as Ransomware actors (Research note, industry reaction) This was not a real-world attack, but for me this shows a completely new way to smuggle code in a target environment: Just convey what you want to achieve in English and let an LLM agent do the coding on demand.
Writing long text without punctuation Apparently, that can overload the context windows of LLMs.
Agentic Webbrowsers have issues Whether they are summarizing webpages, parse hidden html tags or screenshots, traditional web security assumptions do not hold.
Google Gemeni also was affected Three vulnerabilities were found.

Thomas Roccia publishes a nice classification of Adversarial Prompts on SecurityBreak.io. Currently, he lists 38 vectors in four categories.

What can we expect?

The team from Brave found this consistent theme in agentic browsing vulnerabilities:

Readers will note that each of these attacks look similar. Fundamentally, they boil down to a failure to maintain clear boundaries between trusted user input and untrusted Web content when constructing LLM prompts while allowing the browser to take powerful actions on behalf of the user.

We recognize that this is a hard problem, and we have some longer-term ideas that we're exploring in collaboration with our research and security teams to address such problems. But until we have categorical safety improvements (i.e., across the browser landscape), agentic browsing will be inherently dangerous and should be treated as such. In the meantime, browsers should isolate agentic browsing from regular browsing and initiate agentic browsing actions (opening websites, reading emails, etc.) only when the user explicitly invokes them.

This is a good summary, and it applies in similar fashion to other applications of LLMs that deal with user-supplied input as well.

There will be no simple solution.

If we look back to the long list of non-LLM solutions, we can check some of the solutions developed there can help us going forward. Let's see:

Separate control instructions from data: This is tricky. Both can be English text. There is no (or at least not always) clear protocol framing that separates these two. People are mucking with quotation marks and other in-band framing band-aid. I haven't seen the equivalent of an SQL prepared statement.

Filtering out malicious code: We probably already failing at "detecting code" now. Almost any English text can be an instruction for an LLM. Tuning WAFs is tricky enough but think about what it would take to have a border-device screen any incoming text for potential prompt injection. That would need yet another LLM to do natural language processing. This will not work.

Look at this evolution in threat actor tooling:

They brought their own tools (for remote access, lateral movement, encryption, …): Pattern-based AV had at least a chance.
They switched to "living of the land": just bring scripts to control legitimate administrative tools that are already present: Maybe the execution of these scripts can be detected.
(coming) Just bring the right prompts to local LLM agents. No need for tool downloads or fancy command lines. Just plain text and maybe a creative way to feed that text to the LLM. This will be "living of the land" on steroids

Restrict what LLMs are allowed do: If you give your agentic LLM your own rights/permissions, then any mistake the LLM makes can do serious harm. The probably right way (but horribly complex) way to deal with this are granular permissions. Maybe the LLM is allowed to propose a calendar entry based on the e-mail it processed, but deleting appointments is a step too far. You should treat your LLM as a very enthusiastic intern. It's fine to delegate a lot of work to him, but don't hand him the keys to your kingdom.

The guardrails need to be external: Just telling the LLM what it should do before inputting the network-supplied content might not be sufficient as it is impossible to rule out prompt injections. So maybe you need a second LLM just as an overseer of the main LLM. Even better, that control module could contain hardcoded deterministic (i.e., old-school software) controls that restrict the behaviour of your agentic LLM.

Summary

With LLMs, we're running into the same fundamental issues that have plagued IT security for the last 50 years. We're making the same mistakes again, and this time they are inherent to the LLM technology and thus they are even harder to tackle.

In other words, LLM security will be really hard to get right.

It is about to get worse.

02.12.2025 16:49 Don't say "Jehova" to an LLM