Ferret: Apple launches open source LLM model to analyze images with AI

2023-12-26 14:03:10

Almost silently, the Apple launched, in October this year, the Ferret – one broad language model (LLM) multimodal and open code, something very rare for technologies being developed by the company. The tool, however, was made available with a non-commercial license, which directs its use primarily for research.

According to VentureBeat, people started talking more about Ferret after the release of two company studies on new techniques for creating 3D avatars and efficient inference in language models. The research raised expectations regarding the potential for immersive experiences and the use of complex artificial intelligence systems in Apple devices.

???????? Introducing Ferret, a new MLLM that can query and build on anything anywhere at any granularity.
???? arxiv.org/abs/2310.07704
1️⃣ Ferret makes it possible to examine a region of an image in any way.
2️⃣ It generally shows more accurate understanding of small image regions than GPT-4V (sec 5.6).

The model can, in this way, analyze elements in images with great flexibility, as well as determine and identify such elements. These capabilities are especially useful in tasks such as performing searches, for example, so that Ferret can identify what is being portrayed in images and provide more details about the object in question and the context of the photo.

Second the article in which researchers from Apple and Columbia University described how the model works, it was trained with 1.1 million of samples that contained spatial hierarchical information. About 95 mil Negative data also helped to give Ferret more robustness, which achieves better results than competitors, with the ability to describe image details and less object confusion.

Despite the surprise in the launch of the open source model, it is Apple’s strategy to face more advanced competitors in terms of AI, such as OpenAI it’s at Anthropic. As Maçã’s infrastructure is not yet sufficient to power large-scale LLMs, the options would be to depend on third-party servers or launch the model as open source from the start, which was adopted.

Making Ferret available in open source surprised several researchers in the field of AI and machine learning. As noted in Reddit community about Applethe tool is being trained using eight A100 graphics cards (from NVIDIA) and 80GB of memory. In the coming months, it is possible that more news about the solution will emerge — both from Apple and from researchers working with Ferret.

Even with the current license for non-commercial use, the tool could, after advances in its development, be adapted in the future for use on iPhones, iPads, Macs and other company devices. Even though it is a strategy, the open source release is still different from Apple’s general conduct, which is to develop new resources/systems with the greatest possible secrecy and separately from others, allowing research to mature with the Ferret.

Ferret’s code is available on GitHub.


1703604690
#Ferret #Apple #launches #open #source #LLM #model #analyze #images

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.