Open-Source AI
AI models, datasets, and tools released publicly under open licences, enabling community use, inspection, and modification.
Definition
Open-source AI encompasses models, training code, datasets, and tools released publicly under licences permitting free use, modification, and redistribution. The open-source AI movement accelerated dramatically in 2023 when Meta released LLaMA and LLaMA 2, followed by Mistral, Falcon, and many community fine-tunes.
Key open-source AI assets include: models (LLaMA 3, Mistral 7B, Stable Diffusion, Whisper), frameworks (PyTorch, TensorFlow, Hugging Face Transformers), datasets (The Pile, Common Crawl, LAION-5B), and tools (vLLM, llama.cpp, Gradio). Hugging Face hosts over 500,000 models.
The "open" vs "closed" AI debate is nuanced: LLaMA is "open weights" but not "open source" by the OSI definition (it has commercial restrictions). True open-source AI (Apache 2.0, MIT) from organisations like EleutherAI and OLMo releases both weights and training code/data.
Examples
- LLaMA 3 (Meta)
- Mistral 7B
- Stable Diffusion
- Whisper
- PyTorch
Companies using this
Related Terms
Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.
Fine-tuning
Continuing training of a pre-trained model on domain-specific data to specialise it for a particular task.
Foundation Model
A large model trained on broad data that can be adapted to many downstream tasks via fine-tuning or prompting.