01.06.2026 15:17
On the cyber-security implications of current LLMs
The rapid progress in the capabilities of LLMs for cyber-security related tasks naturally leads to the question of what the right response should be. With regards to CISOs, this (German) article on our webpage is my summary which also links to the paper from the Cloud Security Alliance .
Naturally, also the policy layer has woken up, and – as is their usual response – something needs to be done. (Or more precisely: we need to be seen doing something.) What exactly the EU or national politicians should do to steer the current developments in the right direction is still an open question. Some proposals are bubbling up in various forums, and some are formulated openly.
One of the documents I’ve been asked to comment on triggered my inner Monk. The structure wasn’t consistent so I started to sketch out how I would approach the problem. The issue is that the LLMs change multiple different things, and unless you are careful not to mix these up, the policy response will be a confused mess.
So, here is a rough outline of how I structure the problem set in my mind. It’s not a complete treatment of all the points, just a scaffolding that needs to be fleshed out. Nevertheless, I think it could provide some value.
What has happened?
- Models are improving, 2025 -> 2026 was a significant step
- It’s not just Anthropic, other frontier models are at a similar level
- Mythos was a genius PR stunt to prepare for the IPO of Anthropic
- Open weight models are a bit behind but are already pretty good and will reach the level of Mythos in a few months
- Research in this area is intensive and happening all over the world. Further progress is likely to be rapid and significant
What can these models do now?
- vulnerability research on software
- assist in patch development
- automated penetration testing
- patch reverse engineering: finding exploits for vulnerabilities which just got fixed
- targeted search for vulnerabilities based on published advisories
- chaining of vulnerabilities to create complex exploit code
- AI-Assisted or fully agentic kill-chains can execute very quickly
What is the impact?
- The rate of new vulnerabilities findings has increased significantly
- strain on the resources to deal with them (triage, patch-dev, testing, publication, rollout)
- that hits: vendors, Open-Source maintainers, CSIRTs, customers, users
- we don't know yet if this is a one-time wave, or a permanent increase in the rate of vuln-findings
- Duplicate findings are increasing
- Open-Source maintainers stop treating reports confidentially
- The CVD secrecy time-window is collapsing
- Once a vulnerability is disclosed/patched, exploit code will be developed very quickly
- the window for planned patching (testing, rollout, ...) is shrinking
- Chaining of vulns can create a highly relevant meta-vuln out of multiple low-rated vulns
- low-CVSS vulns cannot be ignored
- Legacy / un-maintained software becomes a critical risk
- if fully supported code struggles with keeping up, old codebases without active maintenance just cannot
- Quicker kill-chain execution puts very strong pressure on the reaction-times of defenders
- SOC speed for detection must improve
- Can humans react quickly enough?
What should be done?
- Vendors
- get ahead of the curve: use LLMs in the QA process
- be sure to have enough manpower to deal with the wave
- CVD processes need to become faster
- CSIRTs
- revisit the advisory process, can we continue with the current scheme?
- Should LLM-based pen-testing be part of the portfolio of CSIRTs?
- Critical Infrastructure
- Security basics become even more important (e.g. reduce attack surface)
- improve detection and automated response (SOAR)
- EU Policy
- Trying to regulate these models is a fool's game, the cat is out of the bag
- A "we need to build datacentres for AI in Europe"-response is also likely to be wrong - unless built in a 100% renewable energy and cooling setting
- Check what others are doing. E.g., Singapore on Agentic AI, UK funding a Research Institue, ...
- Check how the "best current security practices" are changing due to LLMs, make sure security mandates are updated
- Check if NIS2 CVD, CRA vuln handling, and ENISA's CVD processes are capable of handling the current wave
I might be updating this list over the next weeks.
Should we start a betting pool on what ideas the EU will come up with?