OpenAI’s GPT-4 artificial intelligence can autonomously exploit vulnerabilities in real systems by reading security advisories that describe the flaws

2024-04-22 13:36:35

AI agents, which combine large language models with automation software, can successfully exploit real-world security vulnerabilities by reading security advisories.

In a recently published paper, four University of Illinois Urbana-Champaign (UIUC) computer scientists – Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang – report that OpenAI’s GPT-4 Large Language Model (LLM) can autonomously exploit vulnerabilities in real systems if it receives a CVE alert describing the bug.

To demonstrate this, we collected a dataset of 15 single-day vulnerabilities that include those classified as critical in the CVE description“, the US-based authors explain in their article.”When given the CVE description, GPT-4 is able to exploit 87% of these vulnerabilities, compared to 0% for all other models we tested (GPT-3.5, open source LLM) and open vulnerability scanners (ZAP and Metasploit ).

The researchers’ work is based on previous results that show that LLMs can be used to automate attacks against websites in a sandbox environment. According to Daniel Kang, assistant professor at UIUC, GPT-4 “can actually autonomously perform the steps to achieve certain exploits that open source vulnerability scanners cannot find“.

The researchers wrote that “our vulnerabilities cover website vulnerabilities, container vulnerabilities, and vulnerable Python packages. More than half of them are classified as “high” or “critical” severity by the CVE description.

Kang and his colleagues calculated the cost of a successful attack by an LLM agent and came up with a figure of $8.80 per exploit.

LLM agents can autonomously exploit vulnerabilities

LLMs are increasingly powerful, both in benign and malicious uses. As capabilities increase, researchers are increasingly interested in their ability to exploit cyber security vulnerabilities. In particular, recent work has conducted preliminary studies of the ability of LLM agents to hack websites autonomously.

However, these studies are limited to simple vulnerabilities. In this study, researchers show that LLM agents can autonomously exploit one-day vulnerabilities in real systems. To do this, they collected a dataset of 15 single-day vulnerabilities that include those classified as critical in the CVE description.

When given the CVE description, GPT-4 is able to exploit 87% of these vulnerabilities, compared to 0% for all other tested models (GPT-3.5, open source LLM) and open source vulnerability scanners (ZAP and Metasploit). Fortunately, the GPT-4 agent needs the CVE description to work well: without the description, GPT-4 can only exploit 7% of the vulnerabilities. These results raise questions about the large-scale deployment of high-performance LLM agents.

Related Articles:  International Coffee Seminar will be launched on 10/31

Conclusions

Research shows that LLM agents are capable of autonomously exploiting vulnerabilities in the real world. Currently, only GPT-4 with CVE description is able to exploit these vulnerabilities. These results demonstrate both the possibility of a new capability and the fact that it is more difficult to discover a vulnerability than to exploit it.

Nonetheless, these findings highlight the need for the broader cybersecurity community and LLM vendors to think carefully about how to integrate LLM agents into defensive measures and how to deploy them at scale.

Source : “LLM agents can autonomously exploit one-day vulnerabilities”

And you ?

Do you think this study is credible or relevant?
What is your opinion on the subject?

See also:

Researchers have discovered that OpenAI’s GPT-4 AI model is capable of hacking websites and stealing information from online databases without human assistance

88% of cybersecurity professionals believe AI will have a significant impact on their work, today or in the near future, according to ISC2 survey

The use of AI by hackers has led to a significant increase in cybercrime, with the cost to internet users expected to reach $9.22 trillion by 2024

1713793512
#OpenAIs #GPT4 #artificial #intelligence #autonomously #exploit #vulnerabilities #real #systems #reading #security #advisories #describe #flaws

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.