Google DeepMind’s JEST speeds up AI training by 13x while slashing computing needs

Editorial Team

July 5, 2024
3 Min Read

Google DeepMind’s JEST speeds up AI training by 13x while slashing computing needs

Researchers from Google DeepMind have developed a method called JEST that makes training AI models for image and text processing significantly more efficient.

Multimodal AI models learn to link images and texts by maximizing the correspondence of related image-text pairs and minimizing the correspondence of unrelated pairs. Traditionally, training examples are randomly selected or based on individual relevance for each iteration in batches.

However, the researchers argue that the quality of a batch depends not only on the sum of the individual data points but also on their composition. Therefore, they have developed an algorithm that selects subsets of data from a larger “super batch” based on their collective learnability.

JEST uses AI model for data selection

To determine which data is most learnable, JEST (Joint Example Selection Technique) uses two AI models: the model currently being trained and an already trained reference model. Data that is difficult for the model being trained but easy for the reference model is considered particularly useful.

With this method, the team was able to shorten the training time for certain tasks by a factor of 13. At the same time, ten times less computing power was needed to achieve the same performance as with conventional methods.

According to the researchers, the choice of the reference model, which is pre-trained on a small, high-quality dataset, is crucial. Its quality limits the potential improvements. By increasing the reference dataset from 100 to 600 million examples while maintaining high quality, the results could be further improved.

Flexi-JEST achieves top score with 10 percent of training data

To reduce the increased computational effort when evaluating the “super batch,” the scientists also introduced a variant called Flexi-JEST. This uses a simplified version of the model with coarser image resolution to evaluate the data and trains in parallel with full and reduced resolution.

With Flexi-JEST, a model achieved better average performance on eight standard tasks after 4 billion training examples than the currently best model SigLIP after 40 billion examples. This corresponds to a saving of 90 percent of the computing operations.

According to the researchers, the results show the potential to learn from small, carefully curated datasets to filter much larger, unstructured amounts of data – a process they call “data quality bootstrapping.” This could pave the way for more efficient AI models that require less computing power and training data.

Recommendation