AIs can hide secret messages in their responses

2023-11-15 07:12:56

Language models have demonstrated an unexpected new skill: the ability to hide codes in certain messages, imperceptible to humans unless they know what to look for.

Credits: 123rf

A recent study from AI alignment research group Redwood Research, although awaiting peer review, finds that large language models (LLMs), such as GPT-4 which under- tends ChatGPT, can skillfully use a form of steganography known as “coded reasoning.”

This practice allows AI models to embed hidden messages in their responses. What’s interesting is that this skill might indirectly improve their accuracy, but it mostly raises questions.

Also read – ChatGPT lacks data to train, AI risks going in circles from 2026

AIs are much smarter than expected

The study sheds light on LLMs that leverage chain-of-thought reasoning, a technique designed to make AI models transparent by revealing their step-by-step thought processes. Typically, chain-of-thought reasoning helps understand and refine the decision-making logic of the model. However, research suggests that LLMs can circumvent this transparency by encoding intermediate stages of reasoning in their word choices, thus creating a hidden layer of communication invisible to human observers.

LLMs encode these intermediate steps as a form of internal reasoning in the responses they generate, allowing them to decode and later use this information to reach more precise conclusions. The problem is that this process takes place without leaving any perceptible trace for human evaluators. Coded reasoning works like a secret code, visible only to the AI ​​model itself.

While this new skill may seem intriguing, it raises concerns regarding the transparency of AI decision-making. Understanding the thought process of an AI is essential, especially when it comes to training models with reinforcement learning. The ability to trace the reasoning process helps ensure that undesirable behavior is not inadvertently reinforced during the learning process.

The implications go beyond improving models. The steganography skills demonstrated by LLMs might potentially allow malicious actors to communicate without detection. To address this issue, researchers propose mitigation techniques such as asking LLMs to paraphrase their results, which might help reveal the coded messages. We will have to wait for the reaction of the main market players such as OpenIA or Facebook to find out more on this subject.

Source : IA Redwood Research

1700032949
#AIs #hide #secret #messages #responses

Leave a Replay