Microsoft’s small and efficient LLM Phi-3 beats Meta’s Llama 3 and free ChatGPT in benchmarks

Editorial Team

April 23, 2024
4 Min Read

Microsoft’s small and efficient LLM Phi-3 beats Meta’s Llama 3 and free ChatGPT in benchmarks

Meta’s Llama 3 has just set new standards for open-source models, but Microsoft’s Phi 3 is poised to surpass them – at least on paper. Microsoft is focusing on a key feature of Phi: data quality.

Microsoft Research has developed a new, compact language model called Phi 3 that, according to internal tests, matches the performance of much larger models such as Mixtral 8x7B and GPT-3.5. The context length is 128K.

The Phi-3 mini model, with only 3.8 billion parameters, achieves 69 percent on the MMLU language comprehension benchmark and 8.38 points on the MT benchmark, according to Microsoft.

Thanks to its small size, Phi 3 can run locally on a standard smartphone with as little as 1.8 GB of memory, quantized to 4 bits, and achieving more than 12 tokens per second on an iPhone 14 with an A16 chip.

“It’s like fitting a supercomputer in a flip phone, but instead of breaking the phone, it just breaks the internet with its tiny, yet mighty, linguistic prowess,” the developers jokingly had the model answer when asked how an AI model at the level of ChatGPT could run on a smartphone.

Get the most out of your training data with high-quality training data

According to Microsoft, the secret to Phi 3’s performance lies solely in the training data set. This consists of heavily “education level”-filtered web and synthetic LLM-generated data and builds on the training method used in its predecessors, Phi 2 and Phi 1.

Microsoft emphasizes that the performance was achieved solely by optimizing the training data set. Instead of “wasting” Web data with information such as sports scores, the data set was brought closer to the “data optimum” for a compact model by focusing on knowledge and reasoning skills.

In the first phase of pre-training, mainly web data is used to let the model develop general knowledge and language understanding. In the second training phase, highly filtered, high-quality web data is combined with selected synthetic data to optimize the model’s performance in specific areas such as logic and niche applications.

With the Phi models, Microsoft aims to enable high-quality but much more efficient and cost-effective AI models. Microsoft in particular needs cost-effective models to scale AI across its Windows and Office products and search to turn generative AI into a business model.

Recommendation

Phi 3 beats Llama 3 in many benchmarks

Phi-3-small with 7 billion parameters and Phi-3-medium with 14 billion parameters, both trained with 4.8 trillion tokens, perform similarly to Phi-3-mini in benchmarks with respect to same-class models.

They achieve 75 and 78 percent in the MMLU benchmark and 8.7 and 8.9 points in the MT benchmark. This puts them not far behind much larger models such as Meta’s recently released 70-billion-parameter Llama 3. And Phi models outperform models in the same class in most cases (Phi 3 7b vs. Llama 3 8b).

However, perceived performance in applications and benchmark results do not necessarily match. It remains to be seen to what extent the model will be adopted by the open-source community.

Microsoft cites the Phi-3-mini’s lower capacity for factual knowledge compared to larger models, e.g. in the TriviaQA benchmark, as a weakness. However, this can be compensated by the integration of a search engine. In addition, the training is mainly limited to the English language.

In terms of safety, Microsoft says it has taken a multi-step approach with alignment training, red teaming, automated testing, and independent reviews. This has significantly reduced the number of potentially harmful responses, the company says.

According to Microsoft, Phi 3 uses a similar block structure and the same tokenizer as Meta’s Llama model to allow the open-source community to benefit as much as possible from Phi 3. This means that all packages developed for the Llama 2 model family can be directly adapted to Phi-3-mini.

Source link

Popular Posts

Open Source AI Has Founders—and the FTC—Buzzing

OpenAI’s SearchGPT demo fail shows how hard it is to catch AI bullshit

OpenAI slashes fine-tuning costs for GPT-4o mini with special limited-time offer

SpaceX to get back to launching Falcon 9 rockets • The Register

Microsoft’s small and efficient LLM Phi-3 beats Meta’s Llama 3 and free ChatGPT in benchmarks

Editorial Team

Get the most out of your training data with high-quality training data

Phi 3 beats Llama 3 in many benchmarks

Mozilla finds that most dating apps are not great guardians of user data

Experts Suggest Using SpaceX’s Starship to Rescue Stranded Samples on Surface of Mars

Recent Posts

Archives

Categories

Related Posts

Popular Posts

Get the most out of your training data with high-quality training data

Phi 3 beats Llama 3 in many benchmarks

Tags:

Recent Posts

Archives

Categories

Related Posts