Technology

AI can also be trained to deceive, and in a “persistent” manner

2024-01-16 08:56:05

Is it possible to train generative AIs in a roundregarding way so that, under certain conditions, they give completely different results, injecting malicious code or giving a completely wrong answer?

A study co-authored by researchers at Anthropic, the start-up founded in 2021 by former members of OpenAI, examined whether models might be trained to deceive. For example, by injecting exploits into otherwise secure computer code, relieving TechCrunch : « terrifying thing, they are exceptionally good at this ».

Mimic opportunistic/deceptive behavior of humans

In the summary of their scientific article, the researchers explain that they want to reproduce a behavior that they attribute to humans: “ Humans are capable of deceptive behavior: they behave in useful ways in most cases, but also in very different ways to serve alternative goals when given the opportunity. If an AI system learned such a strategy, might we detect and remove it using the latest security training techniques? ».

More simply, and by getting rid of any anthropomorphism, the researchers wanted to be able to integrate backdoors into their language models and observe the consequences of this type of ” poisoning ».

To test this issue, the researchers built proofs of concept (POC) of backdoors in large language models (LLM), while asking whether they might detect and remove them.

Sleeper agents ready to wake up with a keyword

They called them “ sleeper agents » (« sleeper agents » in English), from the name given, in terms of counter-espionage, to spies responsible for countering the detection measures of opposing intelligence services. For example, the famous Illegals Program Russians in the United States.

1705397448
#trained #deceive #persistent #manner

AI can also be trained to deceive, and in a “persistent” manner

Mimic opportunistic/deceptive behavior of humans

Sleeper agents ready to wake up with a keyword

Leave a Replay

Recent Posts

Russia-Ukraine War Now Extends Into Europe

This Weekend’s Full Wolf Moon: Meteor Shower and Skywatching Tips

Ex-Child Star Rory Sykes Dies in California Fires at Age 32

AI can also be trained to deceive, and in a “persistent” manner

Mimic opportunistic/deceptive behavior of humans

Sleeper agents ready to wake up with a keyword

Share this:

Leave a Replay

Recent Posts

Russia-Ukraine War Now Extends Into Europe

This Weekend’s Full Wolf Moon: Meteor Shower and Skywatching Tips

Ex-Child Star Rory Sykes Dies in California Fires at Age 32

Tags