2024-04-25 10:09:45
Large Language Models (LLMs) have impressive capabilities in various domains, but Smaller Language Models (SLMs) are an attractive option for companies that can leverage them cost-effectively for specific tasks. Microsoft, which introduced the SLM Phi-1 in June 2023, presented the Phi-3 family of open models on April 23. The smallest of them, Phi-3 mini, which is already available, has 3.8 billion parameters and, thanks to its small size, can be deployed locally on a phone or computer.
Microsoft presents the Phi-3 models as “the most efficient, most cost-effective small language models available”.
The Phi-3 Mini is a tight decoder transformer model, fine-tuned using supervised fine-tuning (SFT) and direct preference optimization (DPO) to ensure compliance with human preferences and safety guidelines. It is available at Azure AI Studio, Hugging face a To be.
It was trained for seven days on 512 NVIDIA H100 Tensor Core GPUs, NVIDIA also told us that it was possible to try it on ai.nvidia.com where it will be packaged as NVIDIA NIM, “a microservice with a standard application programming interface that can be deployed anywhere”.
In their technical reportthe researchers explain it “The innovation lies solely in our training data set, an enlarged version of the one used for PHI-2, consisting of highly filtered web data and synthetic data“.
The model, trained on 3.3 trillion tokens, was also adjusted for robustness, security and chat format. The popup, which can range from 4,000 to 128,000 tokens, allows it to assimilate and reason regarding large textual content (documents, web pages, code, etc.). According to Microsoft, the Phi-3-mini demonstrates strong reasoning and logic skills, making it a good candidate for analytical tasks.
Solid performance despite its small size
Microsoft shared in its blog the performance of Phi-3 mini, but also the performance of Phi-3-small (7B) and Phi-3-medium (14B) which will be available soon and were trained on 4.8 trillion tokens.
The performance of the Phi-3 models was compared to that of the Phi-2, Mistral-7b, Gemma-7B, Llama-3-instruct-8b, Mixtral-8x7b, GPT-3.5 Turbo and Claude-3 Sonnet. All reported figures are produced with the same pipeline so that they are effectively comparable.
Phi-3-mini outperforms Gemma-7B and Mistral-7B on some benchmarks such as MMLU, while the significantly better performing Phi-3-small and Phi-3-medium outperform much larger models, including the GPT-3.5 Turbo. However, due to their small size, Phi-3 models are less competitive for tasks focused on factual knowledge, such as those evaluated in TriviaQA.
However, their capabilities in many other areas make them particularly useful in scenarios where model size and available resources are critical factors, such as in resource-constrained environments or applications that require fast response times.
1714041343
#Microsoft #presents #Phi3 #generation #small #language #models