ML applications

Starbucks: A New AI Training Strategy for Matryoshka-like Embedding Models which Encompasses both the Fine-Tuning and Pre-Training Phases

Editorial Team

October 24, 2024
4 Min Read

Starbucks: A New AI Training Strategy for Matryoshka-like Embedding Models which Encompasses both the Fine-Tuning and Pre-Training Phases

In machine learning, embeddings are widely used to represent data in a compressed, low-dimensional vector space. They capture the semantic relationships well for performing tasks such as text classification, sentiment analysis, etc. However, they struggle to capture the intricate relationships in complex hierarchical structures within the data. This leads to suboptimal performances and increased computational costs while training the embeddings. Researchers at The University of Queensland and CSIRO have developed an innovative solution for training 2D Matryoshka Embeddings to improve their efficiency, adaptability, and effectiveness in practical utility.

Traditional embedding methods, such as 2D Matryoshka Sentence Embeddings (2DMSE), have been used to represent data in vector space, but they struggle to encode the depth of complex structures. Words are treated as isolated entities without considering their nested relationships. Shallow neural networks are used to map these relationships, so they fail to capture their depth. These conventional methods exhibit significant limitations, including poor integration of model dimensions and layers, which leads to diminished performance in complex NLP tasks. The proposed method, Starbucks, for training 2D Matryoshka Embeddings, is designed to increase the precision in hierarchical representations without needing high computational costs.

This framework combines the two phases: Starbucks Representation Learning (SRL) and Starbucks Masked Autoencoding (SMAE). SMAE is a powerful pre-training technique that randomly masks some portions of input data that the model must retrieve. This technique gives the model a semantic relationship-oriented understanding and better generalization across dimensions. SRL is the fine-tuning of the existing models through computing losses associated with specific layer-dimension pairs in the model, which further enhances the capability of the model to capture the more nuanced data relationships and increases the accuracy and relevance of the outputs. The empirical results of the Starbucks methodology demonstrate that it performs very well by improving the relevant performance metrics on the given tasks of natural language processing, particularly while considering the assessment task of text similarity and semantic comparison, as well as its information retrieval variant.

Two metrics are used to estimate the performance: Spearman’s correlation and Mean Reciprocal Rank (MRR), showing in detail what the model can or cannot do. Substantial evaluation of broad datasets has validated the robustness and effectiveness of the Starbucks method for a wide range of NLP tasks. Proper evaluation in realistic settings, in turn, plays a primary role in establishing the method’s applicability: on clarity of performance and reliability, such evaluations are critical. For instance, with the MRR@10 metric on the MS MARCO dataset, the Starbucks approach scored 0.3116. It thus shows that, on average, the documents relevant to the query have a higher rank than that achieved by the models trained using the “traditional” training methods, such as 2D Matryoshka Sentence Embeddings (2DMSE).

The approach named Starbucks addresses the weaknesses of 2D Matryoshka embedding models by including a new training methodology that improves adaptability and performance. A few of its strengths include the ability to match or beat the performance of independently trained models and increase computational efficiency. Further validation is thus required in real-world settings to assess its appropriateness across a wide range of NLP tasks. This work is vital for the direct embedding of model training. It may provide avenues for improving NLP applications, which would lead to inspiration for future developments in adaptive AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

Afeerah Naseem is a consulting intern at Marktechpost. She is pursuing her B.tech from the Indian Institute of Technology(IIT), Kharagpur. She is passionate about Data Science and fascinated by the role of artificial intelligence in solving real-world problems. She loves discovering new technologies and exploring how they can make everyday tasks easier and more efficient.

Listen to our latest AI podcasts and AI research videos here ➡️

Source link