2024-08-30 16:40:00
ChatGPT can do a decent summary of War and Peace by Leo Tolstoy or popularize the great principles of quantum physics, but to the questions “how many “r” are there in “strawberry” » ? » or « how many “n” are there in ““surprise”? “, he is probably wrong. This test has become a way to evaluate artificial intelligences, especially large language models (LLM). Regularly, users have fun publishing screenshots of their conversation on this subject with the chatbot of OpenAI, that of Meta (Meta AI, which works thanks to Llama 3.1), or even Claude 3 of Anthropic.
This experience underlines one essential thing: “an artificial intelligence does not have a human brain”recalls Patrick Pérez, CEO of Kyutai, a French laboratory specializing in generative AI. “She can have almost superhuman abilities in certain areas and not know how to perform tasks that may seem perfectly stupid, which always has a striking effect. These models work by analogy, which means that if they have not seen this type of request enough times in their training, they will not know how to respond.”
Is Kyutai’s Voice AI Moshi Cutting the Grass Under OpenAI’s Feet?
Operation by tokenization
For this specific task (counting the number of letters in a word), the failure of the LLM is also linked to the tokenization operation. Language models cut texts into small units, called tokens. A token can be a letter (rarely), a word, a set of words… In other words: ChatGPT, or any other large language model, does not read. It does not “see” the text as a series of individual characters, but as a series of concepts encapsulated in tokens.
This allows them to process and generate large amounts of text, but makes them poor performers at tasks that require fine manipulation of characters, such as counting the number of “r”s in a word. This also explains why LLMs are particularly poor at making anagrams, finding palindromes, etc.
“One way to get around this inability is to ask the LLM to break down the task into several steps, this is called the principle of chains of thoughts.”says Laurent Daudet, founder of the French start-up LightOn, which develops AI models dedicated to businesses. “For example, we ask him to spell the letters in strawberry, then count the number of “r”s. In this case, he will take a token for each letter, and will normally arrive at the correct result.”
Strawberry, the name chosen by OpenAI for its model capable of reasoning
The strawberry test is also emblematic for this reason. It shows that the models do not yet have advanced integrated reasoning capabilities, because a human still needs to ask them to break down an action.
Directly teaching models to reason in stages has become the new Holy Grail of the industry. “The idea is to have models called “agents” capable of giving themselves an objective and a strategy to achieve it. They do this either intrinsically or by using other tools such as a calculator to do mathematics, a search engine to obtain information…”says Patrick Pérez.
OpenAI’s upcoming model, which is said to have far more advanced reasoning capabilities than its predecessors, was recently renamed « Strawberry » (coincidence or not), after being known by the code name “Q*”, reports Reuters. According to The Informationits launch could take place as early as this fall.
Specialized agents to reason about a particular task
Many other companies are working on the subject. Most focus on “agents” specialized in a vertical: computer code for the American Cognition, cybersecurity for the French start-up Mindflow, or sales for the London-based 11x, for example. The French LightOn plans to offer its customers “agents” who are experts in a few specific tasks by the end of the year, in particular responding to calls for tender.
Cyber defenders augmented by artificial intelligence, already a reality
Other companies aim to create all-purpose agents, i.e. an AI capable of responding to a call for tender, counting the number of “r”s in strawberryto plan a vacation from A to Z, to code an application… What some call “artificial general intelligence” (AGI). This is the case of the giants of the sector, such as Meta and OpenAI. This is also the positioning of the French start-up H, which has just lost three of its five prestigious co-founders.
But achieving a system with complex reasoning capabilities seems difficult with a simple large language model, even if enriched with tools. Some experts believe that a change of architecture will be necessary.
Record fundraising for H, the French AI startup that challenges Mistral and OpenAI
1725063196
#Teaching #reason #Holy #Grail #industry