2024-03-08 15:44:33
The encouragement messages integrated into the prompts have surprisingly positive consequences on the responses provided by large language models. However, there is no real method in this matter. The best is therefore to let the AI itself find the best prompts or to be very creative, by evoking Star Trek for example…
“O venerable intelligence, you who have read all the books and whose wisdom has no equivalent, deign to enlighten with your lights the humble mortal who taps on his keyboard”. By addressing ChatGPT in this way, there is a greater chance that it will answer you correctly than if you say: “Stupid stochastic parrot, spit out the series of words that your statistical calculations consider to be the most probable” . Researchers have also shown that responses are better when the model is asked to construct reasoning in stages (chain-of-thought prompting). These tips are part of the popular know-how that is prompt engineering.
Is it possible to optimize these techniques? What wording and encouragement work best? To answer it, VMware NLP Lab researchers tested LLM models of different sizes (from Mistral-7B to Llama2-70B) on math problems (GSM8K), matching the prompts with “positive thinking” messages. In total, the researchers developed 60 variants of messages with different opening and closing formulas and task descriptions that they administered to the models (see below).
After evaluation, the researchers found that the results of the different models vary greatly depending on the messages that are added to the prompts. But the main trend that stands out is that there is no main trend. In other words, positive formulas that improve the performance of one model are of little use with another model. Each model likes to be encouraged in their own way, you might say.
Surprising encouragement
The researchers then decided to optimize the formulas using algorithms. In other words, they let an AI generate prompts and test them. Result: the best automatically generated positive formulas performed better than manually written formulas. But the strangest thing is the style of messages that produced the best results. Like the following opening formula which obtained the best score with the Llama2-70B model:
“Command, we need you to chart a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this difficult situation”.
The researchers note with surprise that “the model’s mathematical reasoning skills can be improved by expressing an affinity for Star Trek. This revelation adds an unexpected dimension to our understanding and introduces elements that we would not have considered or attempted independently.
In conclusion, the researchers find it both surprising and irritating that trivial modifications in the prompts can change the performance of the models to this extent. And all the more so since there seems to be no clear method to generate better results.
Prompt engineering is therefore not really a technique with established rules. Either we let the AI optimize the prompts by testing countless variants, which requires large computing capacities. Either we make trial and error without hesitating to be creative and use formulas that are off the beaten track, Star Trek style.
1709914537
#good #response #encourage #talk #Star #Trek