Natural language processing

Open Source Initiative releases first formal definition of open-source AI

Open Source Initiative releases first formal definition of open-source AI



summary
Summary

The Open Source Initiative (OSI) has released its first formal definition of what constitutes open-source AI.

The announcement came during the All Things Open 2024 conference, following “multiple years of research and collaboration, an international roadshow of workshops, and a year-long co-design process.”

The definition sets clear requirements that many current AI models, including Meta’s Llama, don’t meet. The one that stands out is that it requires AI model makers to provide enough detail about their training data that a “skilled person can recreate a substantially equivalent system using the same or similar data.” This level of transparency goes well beyond what most AI companies currently offer, according to Mozilla AI strategy lead Ayah Bdeir.

At its core, the definition outlines essential freedoms that any open-source AI system must provide. Users need to be able to run the system for any purpose, examine how it works, make modifications, and share it with others. To enable this, companies must release complete information about training data, source code, and model parameters in a format that allows for modifications.

Ad

The new definition applies to both complete AI systems and individual components like models and weights, aiming to bring the traditional benefits of open source – autonomy, transparency, and collaborative improvement – to the AI field.

Meta’s Llama models aren’t open enough

The definition directly challenges Meta’s claims about its Llama models. While Meta promotes itself as a champion of open AI development, its approach doesn’t meet the OSI’s criteria, something the organization has repeatedly criticized in the past.

Meta releases its model weights but keeps training data private and places restrictions on commercial use – practices that conflict with fundamental open-source principles. The same is true for Google and its Gemma models.

Meta argues that the high costs and complexity of developing large language models require what it calls a “spectrum of openness.” However, skeptics believe Meta may be attempting to exploit loopholes in regulations like the EU AI Act, which offers more lenient treatment for open-source models.

Open Source Initiative releases first formal definition of open-source AI

Source link