Superseded by the European Open Source AI Index | The table below is provided for historical purposes but is no longer updated. We have tripled the amount of models and are including code, audio, and image models at osai-index.eu.
There is a growing amount of instruction-tuned text generators billing themselves as 'open source'. How open are they really? πFAccT'24 πCUI'23
Project | Availability | Documentation | Access |
OLMo 7B Instruct | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | ~ |
Ai2 | LLM base: OLMo 7B | RL base: OpenInstruct | | 12.5 |
BLOOMZ | βοΈ | βοΈ | βοΈ | βοΈ | ~ | ~ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
bigscience-workshop | LLM base: BLOOMZ, mT0 | RL base: xP3 | | 12.0 |
AmberChat | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | ~ | ~ | βοΈ | β | ~ | ~ | β | βοΈ |
LLM360 | LLM base: Amber | RL base: ShareGPT + Evol-Instruct (synthetic) | | 10.0 |
Open Assistant | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | ~ | β | β | β | βοΈ | βοΈ |
LAION-AI | LLM base: Pythia 12B | RL base: OpenAssistant Conversations | | 9.5 |
OpenChat 3.5 7B | βοΈ | β | βοΈ | β | βοΈ | βοΈ | ~ | βοΈ | βοΈ | βοΈ | ~ | β | βοΈ | ~ |
Tshinghua University | LLM base: Mistral 7B | RL base: ShareGPT with C-RLFT | | 9.5 |
Pythia-Chat-Base-7B-v0.16 | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | ~ | β | ~ | ~ | βοΈ | β |
togethercomputer | LLM base: EleutherAI pythia | RL base: OIG | | 9.5 |
Cerebras GPT 111M Instruction | ~ | βοΈ | βοΈ | βοΈ | βοΈ | ~ | β | βοΈ | ~ | β | β | βοΈ | β | βοΈ |
Cerebras + Schramm | LLM base: Cerebras | RL base: Alpaca (synthetic) | | 8.5 |
RedPajama-INCITE-Instruct-7B | ~ | βοΈ | βοΈ | βοΈ | βοΈ | ~ | ~ | ~ | β | β | βοΈ | βοΈ | β | ~ |
TogetherComputer | LLM base: RedPajama-INCITE-7B-Base | RL base: various (GPT-JT recipe) | | 8.5 |
dolly | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | ~ | β | β | β | βοΈ | β |
databricks | LLM base: EleutherAI pythia | RL base: databricks-dolly-15k | | 8.5 |
Tulu V2 DPO 70B | βοΈ | β | ~ | βοΈ | βοΈ | ~ | ~ | ~ | βοΈ | β | ~ | ~ | β | βοΈ |
AllenAI | LLM base: Llama2 | RL base: Tulu SFT, Ultrafeedback | | 8.0 |
MPT-30B Instruct | βοΈ | ~ | βοΈ | ~ | β | βοΈ | βοΈ | ~ | β | β | ~ | β | βοΈ | ~ |
MosaicML | LLM base: MosaicML | RL base: dolly, anthropic | | 7.5 |
MPT-7B Instruct | βοΈ | ~ | βοΈ | ~ | β | βοΈ | βοΈ | ~ | β | β | βοΈ | β | βοΈ | β |
MosaicML | LLM base: MosaicML | RL base: dolly, anthropic | | 7.5 |
trlx | βοΈ | βοΈ | βοΈ | ~ | β | βοΈ | βοΈ | ~ | β | β | β | β | ~ | βοΈ |
carperai | LLM base: various (pythia, flan, OPT) | RL base: various | | 7.5 |
NeuralChat 7B | ~ | β | βοΈ | βοΈ | βοΈ | βοΈ | ~ | ~ | β | β | ~ | ~ | ~ | β |
Intel | LLM base: Mistral 7B | RL base: Orca | | 7.0 |
Vicuna 13B v 1.3 | βοΈ | ~ | βοΈ | β | β | ~ | βοΈ | β | βοΈ | β | ~ | β | βοΈ | ~ |
LMSYS | LLM base: LLaMA | RL base: ShareGPT | | 7.0 |
minChatGPT | βοΈ | βοΈ | βοΈ | ~ | β | βοΈ | βοΈ | ~ | β | β | β | β | β | βοΈ |
ethanyanjiali | LLM base: GPT2 | RL base: anthropic | | 7.0 |
ChatRWKV | βοΈ | ~ | βοΈ | β | β | βοΈ | ~ | ~ | ~ | β | β | β | βοΈ | ~ |
BlinkDL/RWKV | LLM base: RWKV-LM | RL base: alpaca, shareGPT (synthetic) | | 6.5 |
BELLE | βοΈ | ~ | ~ | ~ | ~ | β | ~ | βοΈ | βοΈ | β | β | ~ | β | β |
KE Technologies | LLM base: LLaMA & BLOOMZ | RL base: alpaca, shareGPT, Belle (synthetic) | | 6.0 |
Geitje Ultra 7B | β | ~ | βοΈ | βοΈ | βοΈ | β | β | ~ | ~ | β | ~ | ~ | β | ~ |
Bram van Roy | LLM base: Mistral 7B | RL base: Ultrafeedback Dutch (synthetic) | | 6.0 |
Phi 3 Instruct | β | β | β | β | βοΈ | βοΈ | β | βοΈ | ~ | β | βοΈ | β | ~ | βοΈ |
Microsoft | LLM base: Phi3 | RL base: Unspecified | | 6.0 |
WizardLM 13B v1.2 | ~ | β | ~ | βοΈ | βοΈ | ~ | ~ | βοΈ | βοΈ | β | β | β | β | β |
Microsoft & Peking University | LLM base: LLaMA2-13B | RL base: Evol-Instruct (synthetic) | | 6.0 |
Airoboros L2 70B GPT4 | ~ | β | ~ | βοΈ | βοΈ | ~ | ~ | ~ | β | β | ~ | ~ | β | β |
Jon Durbin | LLM base: Llama2 | RL base: Airoboros (synthetic) | | 5.5 |
ChatGLM-6B | ~ | ~ | βοΈ | β | β | βοΈ | ~ | ~ | β | ~ | β | β | β | βοΈ |
THUDM | LLM base: GLM (own) | RL base: Unspecified | | 5.5 |
Mistral 7B-Instruct | ~ | β | βοΈ | β | ~ | βοΈ | β | ~ | ~ | β | β | β | ~ | βοΈ |
Mistral AI | LLM base: unclear | RL base: unspecified | | 5.5 |
WizardLM-7B | ~ | ~ | β | βοΈ | ~ | ~ | ~ | βοΈ | βοΈ | β | β | β | β | β |
Microsoft & Peking University | LLM base: LLaMA-7B | RL base: Evol-Instruct (synthetic) | | 5.5 |
Mistral NeMo Instruct | ~ | β | βοΈ | β | ~ | βοΈ | β | ~ | β | β | β | β | ~ | βοΈ |
Mistral AI | LLM base: Mistral NeMo | RL base: unspecified | | 5.0 |
Qwen 1.5 | ~ | β | βοΈ | β | βοΈ | β | ~ | ~ | β | β | β | β | ~ | βοΈ |
Alibaba Cloud | LLM base: QwenLM | RL base: Unspecified | | 5.0 |
StableVicuna-13B | ~ | β | ~ | ~ | ~ | ~ | ~ | ~ | ~ | β | ~ | β | β | ~ |
CarperAI | LLM base: LLaMA | RL base: OASST1 (human), GPT4All (human), Alpaca (synthetic) | | 5.0 |
Falcon-40B-instruct | β | ~ | βοΈ | ~ | β | βοΈ | β | ~ | ~ | β | ~ | β | β | β |
Technology Innovation Institute | LLM base: Falcon 40B | RL base: Baize (synthetic) | | 4.5 |
UltraLM | β | β | ~ | βοΈ | ~ | β | β | ~ | βοΈ | β | ~ | ~ | β | β |
OpenBMB | LLM base: LLaMA2 | RL base: UltraFeedback (part synthetic) | | 4.5 |
Yi 34B Chat | ~ | β | βοΈ | β | βοΈ | ~ | β | β | βοΈ | β | β | β | β | ~ |
01.AI | LLM base: Yi 34B | RL base: unspecified | | 4.5 |
Koala 13B | βοΈ | ~ | ~ | ~ | β | ~ | ~ | ~ | β | β | β | β | β | β |
BAIR | LLM base: LLaMA 13B | RL base: HC3, ShareGPT, alpaca (synthetic) | | 4.0 |
Llama 3.1 | ~ | β | ~ | β | β | β | ~ | ~ | β | β | ~ | β | βοΈ | ~ |
Facebook Research | LLM base: Meta Llama 3 | RL base: Meta, undocumented | | 4.0 |
Mixtral 8x7B Instruct | β | β | βοΈ | β | ~ | βοΈ | β | ~ | ~ | β | β | β | ~ | β |
Mistral AI | LLM base: Mistral | RL base: Unspecified | | 4.0 |
Stable Beluga 2 | β | β | ~ | β | βοΈ | ~ | β | ~ | ~ | β | ~ | β | β | ~ |
Stability AI | LLM base: LLaMA2 | RL base: Orca-style (synthetic) | | 4.0 |
Stanford Alpaca | βοΈ | β | ~ | ~ | ~ | β | ~ | βοΈ | β | β | β | β | β | β |
Stanford University CRFM | LLM base: LLaMA | RL base: Self-Instruct (synthetic) | | 4.0 |
Falcon-180B-chat | β | ~ | ~ | ~ | ~ | β | β | ~ | ~ | β | ~ | β | β | β |
Technology Innovation Institute | LLM base: Falcon 180B | RL base: OpenPlatypus, Ultrachat, Airoboros (synthetic) | | 3.5 |
Gemma 7B Instruct | ~ | β | ~ | β | ~ | β | β | ~ | ~ | β | βοΈ | β | β | β |
Google DeepMind | LLM base: Gemma | RL base: Unspecified | | 3.5 |
Orca 2 | β | β | ~ | β | βοΈ | β | β | ~ | ~ | β | ~ | β | β | ~ |
Microsoft Research | LLM base: LLaMA2 | RL base: FLAN, Math, undisclosed (synthetic) | | 3.5 |
Command R+ | β | β | β | βοΈ | βοΈ | ~ | β | β | β | β | ~ | β | β | β |
Cohere AI | LLM base: | RL base: Aya Collection | | 3.0 |
LLaMA2 Chat | β | β | ~ | β | ~ | β | β | ~ | ~ | β | ~ | β | β | ~ |
Facebook Research | LLM base: LLaMA2 | RL base: Meta, StackExchange, Anthropic | | 3.0 |
Nanbeige2-Chat | βοΈ | β | β | β | βοΈ | ~ | β | β | β | β | β | β | β | ~ |
Nanbeige LLM lab | LLM base: Unknown | RL base: Unknown | | 3.0 |
Llama 3 Instruct | β | β | ~ | β | ~ | β | β | ~ | β | β | ~ | β | β | ~ |
Facebook Research | LLM base: Meta Llama 3 | RL base: Meta, undocumented | | 2.5 |
Solar 70B | β | β | ~ | β | ~ | β | β | β | β | β | ~ | β | β | ~ |
Upstage AI | LLM base: LLaMA2 | RL base: Orca-style, Alpaca-style | | 2.0 |
Xwin-LM | β | β | ~ | β | β | β | β | β | β | β | β | β | β | ~ |
Xwin-LM | LLM base: LLaMA2 | RL base: unknown | | 1.0 |
ChatGPT | β | β | β | β | β | β | β | β | ~ | β | β | β | β | β |
OpenAI | LLM base: GPT 3.5 | RL base: Instruct-GPT | | 0.5 |
How to use this table. Every cell records a three-level openness judgement (βοΈ open, ~ partial or β closed) with a direct link to the available evidence; on hover, the cell will display the notes we have on file for that judgement. The name of each project is a direct link to source data. The table is sorted by cumulative openness, where βοΈ is 1, ~ is 0.5 and β is 0 points. Note that RL may refer to RLHF or other forms of fine-tuning aimed at fostering instruction-following behaviour.
Why is openness important?
Open research is the lifeblood of cumulative progress in science and engineering. Openness is key for fundamental research, for fostering critical computational literacy, and for making informed choices for or against deployment of instruction-tuned LLM architectures. The closed & proprietary nature of ChatGPT and kin makes them fundamentally unfit for responsible use in research and education.
Open alternatives provide ways to build reproducible workflows, chart resource costs, and lessen reliance on corporate whims. One aim of our work here is to provide tools to track openness, transparency and accountability in the fast-evolving landscape of instruction-tuned text generators. Read more in the paper (PDF) or contribute to the repo.
TL;DR
Our paper makes the following contributions:
- We review the risks of relying on proprietary software
- We review best practices for open, transparent and accountable 'AI'
- We find over 40 ChatGPT alternatives at varying degrees of openness, development and documentation
- We argue that tech is never a fait accompli unless we make it so, and that openness enables critical computational literacy
We find the following recurrent patterns:
- Many projects inherit data of dubious legality
- Few projects share the all-important instruction-tuning
- Preprints are rare, peer-reviewed papers even rarer
- Synthetic instruction-tuning data is on the rise, with unknown consequences that are in need of research
We conclude as follows:
Openness is not the full solution to the scientific and ethical challenges of conversational text generators. Open data will not mitigate the harmful consequences of thoughtless deployment of large language models, nor the questionable copyright implications of scraping all publicly available data from the internet. However, openness does make original research possible, including efforts to build reproducible workflows and understand the fundamentals of instruction-tuned LLM architectures. Openness also enables checks and balances, fostering a culture of accountability for data and its curation, and for models and their deployment. We hope that our work provides a small step in this direction.
Papers
Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. βOpening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.β In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. July 19-21, Eindhoven. doi: 10.1145/3571884.3604316 (PDF).
Andreas Liesenfeld and Mark Dingemanse. 2024. Rethinking open source generative AI: open washing and the EU AI Act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24). Association for Computing Machinery, New York, NY, USA, 1774β1787. doi: 10.1145/3630106.3659005