Liesenfeld, A., Lopez, A. & Dingemanse, M. 2023. β€œOpening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. July 19-21, Eindhoven. doi: 10.1145/3571884.3604316 (PDF).

There is a growing amount of instruction-tuned text generators billing themselves as 'open source'. How open are they really? πŸ”—ACM paper πŸ”—PDF πŸ”—repo

(maker, bases, URL)Open codeLLM dataLLM weightsRL dataRL weightsLicenseCodeArchitecturePreprintPaperModelcardDatasheetPackageAPI
bigscience-workshopLLM base: BLOOMZ, mT0RL base: xP3
Open Assistantβœ”οΈŽβœ”οΈŽβœ”οΈŽβœ”οΈŽβœ˜βœ”οΈŽβœ”οΈŽβœ”οΈŽ~βœ˜βœ˜βœ˜βœ”οΈŽβœ”οΈŽ
LAION-AILLM base: Pythia 12BRL base: OpenAssistant Conversations
togethercomputerLLM base: EleutherAI pythiaRL base: OIG
OpenChat 3.5 7Bβœ”οΈŽβœ˜βœ”οΈŽβœ˜βœ”οΈŽβœ”οΈŽ~βœ”οΈŽβœ”οΈŽβœ˜~βœ˜βœ”οΈŽ~
Tshinghua UniversityLLM base: Mistral 7BRL base: ShareGPT with C-RLFT
TogetherComputerLLM base: RedPajama-INCITE-7B-BaseRL base: various (GPT-JT recipe)
databricksLLM base: EleutherAI pythiaRL base: databricks-dolly-15k
MPT-30B Instructβœ”οΈŽ~βœ”οΈŽ~βœ˜βœ”οΈŽβœ”οΈŽ~✘✘~βœ˜βœ”οΈŽ~
MosaicMLLLM base: MosaicMLRL base: dolly, anthropic
MPT-7B Instructβœ”οΈŽ~βœ”οΈŽ~βœ˜βœ”οΈŽβœ”οΈŽ~βœ˜βœ˜βœ”οΈŽβœ˜βœ”οΈŽβœ˜
MosaicMLLLM base: MosaicMLRL base: dolly, anthropic
carperaiLLM base: various (pythia, flan, OPT)RL base: various
ethanyanjialiLLM base: GPT2RL base: anthropic
Vicuna 13B v 1.3βœ”οΈŽ~βœ”οΈŽβœ˜βœ˜~βœ”οΈŽβœ˜βœ”οΈŽβœ˜~βœ˜βœ”οΈŽ~
LMSYSLLM base: LLaMARL base: ShareGPT
Cerebras + SchrammLLM base: RL base: Alpaca (synthetic)
BlinkDL/RWKVLLM base: RWKV-LMRL base: alpaca, shareGPT (synthetic)
KE TechnologiesLLM base: LLaMA & BLOOMZRL base: alpaca, shareGPT, Belle (synthetic)
WizardLM 13B v1.2~✘~βœ”οΈŽβœ”οΈŽ~~βœ”οΈŽβœ”οΈŽβœ˜βœ˜βœ˜βœ˜βœ˜
Microsoft & Peking UniversityLLM base: LLaMA2-13BRL base: Evol-Instruct (synthetic)
Airoboros L2 70B GPT4~✘~βœ”οΈŽβœ”οΈŽ~~~✘✘~~✘✘
Jon DurbinLLM base: Llama2RL base: Airoboros (synthetic)
Microsoft & Peking UniversityLLM base: LLaMA-7BRL base: Evol-Instruct (synthetic)
THUDMLLM base: GLM (own)RL base: Unspecified
Mistral 7B-Instruct~βœ˜βœ”οΈŽβœ˜~βœ”οΈŽβœ˜~~✘✘✘~βœ”οΈŽ
Mistral AILLM base: unclearRL base: unspecified
CarperAILLM base: LLaMARL base: OASST1 (human), GPT4All (human), Alpaca (synthetic)
Technology Innovation InstituteLLM base: Falcon 40BRL base: Baize (synthetic)
OpenBMBLLM base: LLaMA2RL base: UltraFeedback (part synthetic)
Stable Beluga 2✘✘~βœ˜βœ”οΈŽ~✘~~✘~✘✘~
Stability AILLM base: LLaMA2RL base: Orca-style (synthetic)
Koala 13Bβœ”οΈŽ~~~✘~~~✘✘✘✘✘✘
BAIRLLM base: LLaMA 13BRL base: HC3, ShareGPT, alpaca (synthetic)
Stanford Alpacaβœ”οΈŽβœ˜~~~✘~βœ”οΈŽβœ˜βœ˜βœ˜βœ˜βœ˜βœ˜
Stanford University CRFMLLM base: LLaMARL base: Self-Instruct (synthetic)
Technology Innovation InstituteLLM base: Falcon 180BRL base: OpenPlatypus, Ultrachat, Airoboros (synthetic)
Orca 2✘✘~βœ˜βœ”οΈŽβœ˜βœ˜~~✘~✘✘~
Microsoft ResearchLLM base: LLaMA2RL base: FLAN, Math, undisclosed (synthetic)
LLaMA2 Chat✘✘~✘~✘✘~~✘~✘✘~
Facebook ResearchLLM base: LLaMA2RL base: Meta, StackExchange, Anthropic
Solar 70B✘✘~✘~✘✘✘✘✘~✘✘~
Upstage AILLM base: LLaMA2RL base: Orca-style, Alpaca-style
Xwin-LMLLM base: LLaMA2RL base: unknown
OpenAILLM base: GPT 3.5RL base: Instruct-GPT

How to use this table. Every cell records a three-level openness judgement (βœ”οΈŽ open, ~ partial or ✘ closed) with a direct link to the available evidence; on hover, the cell will display the notes we have on file for that judgement. The name of each project is a direct link to source data. The table is sorted by cumulative openness, where βœ”οΈŽ is 1, ~ is 0.5 and ✘ is 0 points. Note that RL may refer to RLHF or other forms of fine-tuning aimed at fostering instruction-following behaviour.

Why is openness important?

Open research is the lifeblood of cumulative progress in science and engineering. Openness is key for fundamental research, for fostering critical computational literacy, and for making informed choices for or against deployment of instruction-tuned LLM architectures. The closed & proprietary nature of ChatGPT and kin makes them fundamentally unfit for responsible use in research and education.

Open alternatives provide ways to build reproducible workflows, chart resource costs, and lessen reliance on corporate whims. One aim of our work here is to provide tools to track openness, transparency and accountability in the fast-evolving landscape of instruction-tuned text generators. Read more in the paper (PDF) or contribute to the repo.

If you know a model that should be listed here or a data point that needs updating, please see guidelines for contributors. We welcome any contribution, whether it's a quick addition to our awesomelist or a more detail-oriented contribution to the metadata for a specific project.


Our paper makes the following contributions:

We find the following recurrent patterns:

We conclude as follows:

Openness is not the full solution to the scientific and ethical challenges of conversational text generators. Open data will not mitigate the harmful consequences of thoughtless deployment of large language models, nor the questionable copyright implications of scraping all publicly available data from the internet. However, openness does make original research possible, including efforts to build reproducible workflows and understand the fundamentals of instruction-tuned LLM architectures. Openness also enables checks and balances, fostering a culture of accountability for data and its curation, and for models and their deployment. We hope that our work provides a small step in this direction.

Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. β€œOpening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. July 19-21, Eindhoven. doi: 10.1145/3571884.3604316 (PDF).