What is Open-Source AI?

Fundamentals

Open-Source AI

AI models, datasets, and tools released publicly under open licences, enabling community use, inspection, and modification.

Definition

Open-source AI encompasses models, training code, datasets, and tools released publicly under licences permitting free use, modification, and redistribution. The open-source AI movement accelerated dramatically in 2023 when Meta released LLaMA and LLaMA 2, followed by Mistral, Falcon, and many community fine-tunes.

Key open-source AI assets include: models (LLaMA 3, Mistral 7B, Stable Diffusion, Whisper), frameworks (PyTorch, TensorFlow, Hugging Face Transformers), datasets (The Pile, Common Crawl, LAION-5B), and tools (vLLM, llama.cpp, Gradio). Hugging Face hosts over 500,000 models.

The "open" vs "closed" AI debate is nuanced: LLaMA is "open weights" but not "open source" by the OSI definition (it has commercial restrictions). True open-source AI (Apache 2.0, MIT) from organisations like EleutherAI and OLMo releases both weights and training code/data.