2024-04-22 13:36:35
In a recently published paper, four University of Illinois Urbana-Champaign (UIUC) computer scientists – Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang – report that OpenAI’s GPT-4 Large Language Model (LLM) can autonomously exploit vulnerabilities in real systems if it receives a CVE alert describing the bug.
“To demonstrate this, we collected a dataset of 15 single-day vulnerabilities that include those classified as critical in the CVE description“, the US-based authors explain in their article.”When given the CVE description, GPT-4 is able to exploit 87% of these vulnerabilities, compared to 0% for all other models we tested (GPT-3.5, open source LLM) and open vulnerability scanners (ZAP and Metasploit ).“
The researchers’ work is based on previous results that show that LLMs can be used to automate attacks once morest websites in a sandbox environment. According to Daniel Kang, assistant professor at UIUC, GPT-4 “can actually autonomously perform the steps to achieve certain exploits that open source vulnerability scanners cannot find“.
The researchers wrote that “our vulnerabilities cover website vulnerabilities, container vulnerabilities, and vulnerable Python packages. More than half of them are classified as “high” or “critical” severity by the CVE description.“
Kang and his colleagues calculated the cost of a successful attack by an LLM agent and came up with a figure of $8.80 per exploit.
LLM agents can autonomously exploit vulnerabilities
LLMs are increasingly powerful, both in benign and malicious uses. As capabilities increase, researchers are increasingly interested in their ability to exploit cyber security vulnerabilities. In particular, recent work has conducted preliminary studies of the ability of LLM agents to hack websites autonomously.
However, these studies are limited to simple vulnerabilities. In this study, researchers show that LLM agents can autonomously exploit one-day vulnerabilities in real systems. To do this, they collected a dataset of 15 single-day vulnerabilities that include those classified as critical in the CVE description.
When given the CVE description, GPT-4 is able to exploit 87% of these vulnerabilities, compared to 0% for all other tested models (GPT-3.5, open source LLM) and open source vulnerability scanners (ZAP and Metasploit). Fortunately, the GPT-4 agent needs the CVE description to work well: without the description, GPT-4 can only exploit 7% of the vulnerabilities. These results raise questions regarding the large-scale deployment of high-performance LLM agents.
Conclusions
Research shows that LLM agents are capable of autonomously exploiting vulnerabilities in the real world. Currently, only GPT-4 with CVE description is able to exploit these vulnerabilities. These results demonstrate both the possibility of a new capability and the fact that it is more difficult to discover a vulnerability than to exploit it.
Nonetheless, these findings highlight the need for the broader cybersecurity community and LLM vendors to think carefully regarding how to integrate LLM agents into defensive measures and how to deploy them at scale.
Source : “LLM agents can autonomously exploit one-day vulnerabilities”
And you ?
Do you think this study is credible or relevant?
What is your opinion on the subject?
See also:
1713793512
#OpenAIs #GPT4 #artificial #intelligence #autonomously #exploit #vulnerabilities #real #systems #reading #security #advisories #describe #flaws