Technology Databricks releases Dolly 2.0, the first open, instruction-following LLM...

Databricks releases Dolly 2.0, the first open, instruction-following LLM for commercial use


- Advertisment -

Join top executives in San Francisco on July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more

Databricks today released Dolly 2.0, the next version of the large language model (LLM) with ChatGPT-like human interactivity (aka following instructions) that the company released just two weeks ago.

The company says Dolly 2.0 is the first open source, instruction-following LLM tailored to a transparent and freely available dataset that is also open source for commercial use. That means Dolly 2.0 is available for commercial applications without having to pay for API access or share data with third parties.

Admittedly, there are other LLMs that can be used for commercial purposes, says Ali Ghodsi, CEO of Databricks: “They won’t talk to you like Dolly 2.0.” And, he explained, users can modify and improve the training data because it is made freely available under an open source license. “So you can make your own version of Dolly,” he said.

Databricks has released the dataset on which Dolly 2.0 has been trained

In addition, Databricks said that as part of its ongoing commitment to open source, it is also releasing the dataset Dolly 2.0 has been trained on, called databricks-dolly-15k. This is a corpus of over 15,000 records generated by thousands of Databricks employees, and Databricks says it is the “first open source, human-generated instructional corpus specifically designed to enable major languages ​​to use the magical interactivity of ChatGPT to show.”


Transform 2023

Join us on July 11-12 in San Francisco, where top executives will talk about how they integrated and optimized AI investments for success and how they avoided common pitfalls.

register now

There has been a spate of instruction-following, ChatGPT-like LLM releases over the past two months that are considered open source (or provide some degree of openness or gated access) by many definitions, including Meta’s LLaMA, which in turn inspired others, such as Alpaca, Koala, Vicuna and Databricks’ Dolly 1.0.

However, many of these “open” models fell under “industrial catch,” Ghodsi said, because they were trained on datasets whose conditions limit intent to limit commercial use — such as a dataset containing 52,000 questions and answers from the Stanford Alpaca Project which is trained on OpenAI’s ChatGPT output. But OpenAI’s terms of use, he explained, include a rule that you can’t use output from services to compete with OpenAI.

However, Databricks has found a way around this problem: Dolly 2.0 is a 12B parameter language model based on the open source Eleuther AI pythia model family and tailored exclusively to a small, open source corpus of instruction records (databricks-dolly-15k) generated by Databricks contributors. The license terms of this dataset allow it to be used, modified, and extended for any purpose, including academic or commercial applications.

Models trained on ChatGPT’s output have been in a legal gray area until now. “The whole community has tiptoed around this and everyone is putting out these models, but none of them can be used commercially,” Ghodsi said. “So that’s why we’re super excited.”

Dolly 2.0 is small but mighty

A Databricks blog post emphasized that the 2.0 version, like the original Dolly, is not state-of-the-art, but “displays a surprisingly capable level of instruction-following behavior given the size of the training corpus,” adding that the level of effort and cost required to build powerful AI technologies is “orders of magnitude less than previously thought”.

“Everyone wants to get bigger, but we’re actually interested in getting smaller,” Ghodsi said of Dolly’s petite size. “Second, it is of high quality. We have looked at all the answers.”

Ghodi added that he believes Dolly 2.0 will create a “snowball effect” – where others in the AI ​​community can join in and come up with other alternatives. The limit on commercial use, he explained, was a major obstacle to overcome: “We are delighted that we have finally found a way to do it. I promise you’ll see people apply the 15,000 questions to every model out there, and they’ll see how many of these models suddenly become a little magical where you can interact with them.

VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.

Shreya Christina
Shreya has been with for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

Latest news

SQL Server Change Data Capture (CDC): Revolutionizing Data Tracking and Analysis

In today's data-centric world, the ability to efficiently and accurately track changes in databases is crucial for organizations of...

App vasitesile inanılmaz Pin-Up poker oyunu

ContentAzərbaycanda mövcud olan Depozit və Çıxarma MetodlarıPin Up Casino Oyunçuları üçün bonuslarİlk depozit bonusunu necə əldə etmək olarSlot maşınlarının...

Pin up indir android ⭐️ Pinup indir mobil cazino uygulamasıdır

ContentPin Up-ı iOS-lara nece yükləyib quraşdırmaq olar?Pin Up Casino Azerbaycan YuklePin Up indir android mobil Apk uygulamasıPınup İlk Üyelik...

Pin-up kazino bonusları ᐉ İlk depozit üçün promo kodu PINUPBEST

ContentRəsmi sayt Pin UpAviator Pin UP oynaya biləcəyiniz yerlər - vebsayt və proqramPin Up kazinosunda oyun kateqoriyalarıDepozit mükafatları yoxdurAndroid...
- Advertisement -

Immediate Edge Review 2022 Warning Scam or Legit Read Before Trading

Finally, we are at the conclusion that investors should give Immediate Edge a try for cryptocurrency trading. We are...

Immediate Edge Review 2023: Is It a Scam or Legit? Find Now!

Hacked trading accounts have been reported, with users losing their funds. Immediate Edge puts a high level of protection...

Must read

- Advertisement -

You might also likeRELATED
Recommended to you