What the CVE?

Hey folks,

let's start with a ceremonial dad joke:

The worst job I ever had was ranking soft drinks in order of fizziness. 

It was soda grading

Alright, now: What's going on with all these CVEs, Peter?!

The Beginning

When I went to ElixirConf EU in Málaga a few weeks back, I had quite a few conversations about the impact of LLMs on the security of our industry. Basically, everyone was scared of Mythos. The question was: Will AI make it easy to hack all our systems very soon?

After the conference, the Alembic folks invited me and some friends to spend an afternoon hacking at their apartment. I took the opportunity to put this question to the test. I didn't have access to Mythos (still don't), but I had Opus 4.7 with max effort, which seemed like the next best thing.

My setup was very simple and based on an Anthropic article which suggested to feed files 1-by-1 to Opus together with a narrow prompt focused on finding vulnerabilities. I had Claude write a small bash script that called Claude Code with one filepath at the time and instructed it to write its findings to a markdown file.

This simple setup proved to be extremely efficient. Within a few minutes, Opus found a major vulnerability in Decimal (since disclosed). Basically, the following code would cause the BEAM to use 7gb of RAM and crash:

"1e1000000000"
|> Decimal.new()
|> Decimal.add(1)

I remember testing this finding and seeing the BEAM's memory usage explode. This was my first oh fuck moment.

What happened after

I didn't really know what to do with such a finding so I reached out to Jonatan Männchen, the CISO of the EEF, and a person I already knew from various conferences. Jonatan was on his way to a well-deserved vacation, but still took the time to contact Eric, the author of Decimal, and asked him to enable GitHub's private vulnerability reporting. This allowed me to report the vulnerability in private and it allowed Eric to pull in other authors of libraries that were effected as well. It also made it easier to coordinate the release. I worked with Jonatan throughout his vacation (sorry!) on reporting the vulnerabilities I found.

If there's only one thing you take away from this article, please let it be that you should enable GitHub's private vulnerability reporting on your most used libraries.

It makes it so much easier to report vulnerabilities for everyone involved. It only takes 3 clicks per repo:

General advice: If you find a vulnerability and don't know what to do: Contact the EEF CNA via email cna@erlef.org. The CNA - and Jonatan in particular - are incredibly helpful and resourceful and I have been leaning heavily on their expertise and support (Thank you, Jonatan 💜)

What has happened since

After that first Decimal finding, I was hooked. I knew that the potential of LLMs finding and exploiting severe vulnerabilities was real. It was extremely cheap too! You can scan a library and write exploit scripts for $10 to $30 depending on the size of the library and the depth of your scan. This is insane. It scales easily, runs mostly automated, and has to be done only once. I knew we had to act fast.

I have since collaborated closely with Jonatan to scan and report vulnerabilities in the most downloaded Hex packages. You might have seen the CVEs for Bandit, PlugCowboy, Absinthe, or Phoenix. You can see the latest CVEs published on the EEF's CNA page: https://cna.erlef.org/cves/.

My plan is to keep scanning and reporting until we have fixed all severe vulnerabilities in the most popular Hex libraries. This should put us in a better position when malicious actors discover the BEAM ecosystem and start scanning and exploiting libraries at scale.

FAQs

Let me close this article by answering the most frequently asked questions I've received so far:

1: Which prompts do you use and are they open source?

Yes. I just open-sourced them here. I use 3 different strategies: per_file, per_file_deep, and whole, where per_file is the cheapest and fastest strategy that still finds 80-90% of the most severe vulnerabilities. For important projects, I also run per_file_deep which is largely based on the prompts by the Scrutineer project. I also have a whole strategy which allows Claude to explore an entire project by itself, but it doesn't work that well, takes longer, and costs more, so I don't run it often. I'm hoping that Mythos will do better with this strategy, so I keep it.

2: Are you going to retire soon with all the money you receive from reporting CVEs?

No. I have received no money so far for this work. AFAIK there's no bug bounty program on any of the libraries I've scanned. Until now, I have done this at my own expense which includes one Claude Max 20 plan plus a lot of my free time including weekends, evenings, and holidays. I see it as a contribution to the ecosystem.

3: What do you think about Mythos?

I had no access to it yet, but from reports of people that I respect like Daniel Stenberg I think that it will be better than Opus 4.7 at finding vulnerabilities, but it won't be a completely new thing. My gut feeling is that where Opus finds 80-85% of the vulnerabilities, Mythos might find 90-95%. That's why I think it's important to start now with what we have (Opus 4.7 on max effort) and then re-scan everything once Mythos becomes available.

4: How does the reporting and disclosure process look like?

The CVE Numbering Authority (CNA) of the EEF has a full guide on this here: https://cna.erlef.org/security-policy

And general guidelines for responsible disclosure here: https://security.erlef.org/security_vulnerability_disclosure

But in essence: If you don't have the GitHub private vulnerability reporting enabled on your library, I or someone from the CNA will reach out to you and ask you to enable it or ask how we can otherwise report the vulnerability. If you enable it, I will send you a report through there and it will look similar to a PR discussion. It will have a description of the vulnerability and a script that reproduces the vulnerability.

GitHub allows you to create a temporary private fork of your library to which you can add the fix. I can review the PR if you want. Once you and the CNA are ready, you publish the report, merge in the fix, and create a new release. The CNA publishes the CVE, usually within minutes to hours. The users of your library will then have to upgrade to the new version. That's it really. It seems more dramatic than it really is.