Watch the Low-Code/No-Code Summit on-demand sessions to learn how to successfully innovate and achieve efficiencies by upskilling and scaling citizen developers. Watch now.
Large Language Models (LLMs) are the talk of the AI world right now, but training them can be challenging and expensive; models with billions of parameters require months of work by experienced engineers to get up and running (reliable and accurate).
A new joint offering from Cerebras Systems and Cirrascale Cloud Services aims to democratize AI by giving users the ability to train GPT-class models at a much lower cost than existing providers – and with just a few lines of code.
“We believe that LLMs are underhyped,” said Andrew Feldman, CEO and co-founder of Cerebras Systems said in a pre-briefing. “Within next year, we will see a huge increase in the impact of LLMs in different parts of the economy.”
Likewise, generative AI may be one of the most significant technological advances in recent history, as it offers the ability to write documents, create images, and code software using plain text input.
To accelerate adoption and improve the accuracy of generative AI, Cerebras also today announced a new partnership with the AI content platform Jasper AI.
“We really feel like the next chapter of Generative AI is personalized models that keep getting better,” said Dave Rogenmoser, CEO of Jasper.
The first phase of the technology was “really exciting,” he said, but “it’s going to get much, much more exciting.”
Unlock research opportunities
Compared to LLMs, traditional cloud providers struggle because they cannot guarantee latency between large numbers of GPUs. Feldman explained that variable latency presents complex and time-consuming challenges when distributing a large AI model to GPUs, and that there are “big swings in time to train.”
The new Cerebras AI Model Studio, which is hosted on the Cirrascale AI innovation cloud, allows users to train Generative Transformer (GPT) class models – including GPT-J, GPT-3, and GPT-NeoX – on Cerebras Wafer-Scale Clusters. This includes the recently announced Andromeda AI supercomputer.
Users can choose from state-of-the-art GPT-class models, ranging from 1.3 billion parameters to 175 billion parameters, and complete training with eight times faster time to accuracy than on an A100, and at half the price of traditional cloud providers. said Feldman.
For example, training time on GPT-J with a traditional cloud is about 64 days from the start; the Cerebras AI Model Studio takes that down to eight days from zero. Similarly, the production cost on traditional clouds on GPUs alone is $61,000; while on Cerebras it’s $45,000 for the full production run.
The new tool eliminates the need for devops and distributed programming; scanning a push button model can be from one to 20 billion parameters. Models can be trained with longer sequence lengths, opening up new research possibilities.
“We’re unlocking a fundamentally new opportunity to conduct research at this scale,” said Andy Hock, head of product at Cerebras.
As Feldman noted, Cerebras’ mission is “to broaden access to deep learning and rapidly accelerate the performance of AI workloads.”
The new AI Model Studio is “easy and dead simple,” he said. “We’ve organized this so that you can jump, point and click.”
Accelerating the potential of AI
Meanwhile, young Jasper (created in 2021) will use Cerebras’ Andromeda AI supercomputer to train its compute-intensive models in “a fraction of the time,” Rogenmoser said.
As he noted, companies want personalized models, “and they really want them.”
“They want these models to get better, self-optimize based on past usage data, based on performance,” he said.
In its first work on small workloads with Andromeda – which was announced this month on SC22the International Conference on High Performance Computing, Networking, Storage and Analytics — Jasper found that the supercomputer completed work that thousands of GPUs couldn’t do.
The company expects to “drastically improve AI work,” including training GPT networks to adapt AI output to all levels of complexity and granularity for end users. This allows Jasper to quickly and easily personalize content for multiple customer classes, according to Rogenmoser.
The partnership “allows us to invent the future of generative AI by doing things that are impractical or simply impossible with traditional infrastructure,” he said.
Jasper’s products are used by 100,000 customers to write marketing texts, advertisements, books and other materials. Rogenmoser described the company as eliminating “the tyranny of the blank page” by serving as “an AI copilot”.
As he put it, this allows creators to focus on the most important elements of their story, “not the mundane”.
VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.