Join top executives in San Francisco on July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more
During her 2023 TED talk, computer scientist Yejin Choi made a seemingly contradictory statement when she said, “AI today is incredibly intelligent and then shockingly stupid.” How can something intelligent Be stupid?
By itself, AI – including generative AI – is not built to deliver accurate, context-specific information focused on a particular task. In fact, measuring a model in this way is a silly message. Think of these models as targeting relevance based on what it has experienced and then generating responses to these likely theories.
That’s why, while generative AI continues to amaze us with creativity, it often falls short when it comes to B2B requirements. Sure, it’s smart to let ChatGPT spin a social media copy like a rap, but if it’s not kept on a leash, generative AI can hallucinate. This is when the model produces false information that masquerades as the truth. No matter what industry a company is in, these dramatic shortcomings are definitely not good for the company.
The key to enterprise-ready generative AI lies in rigorously structuring data so that it provides context, which can then be used to train highly sophisticated large language models (LLMs). A well-choreographed balance of polished LLMs, actionable automation, and select human checkpoints form strong anti-hallucination frameworks that enable generative AI to deliver accurate results that create true B2B enterprise value.
For any business looking to capitalize on the unlimited potential of Generative AI, here are three essential frameworks to include in your technology stack.
Build strong anti-hallucination boxes
Understood AI, a company that can identify generative falsehoods, ran a test and found that ChatGPT’s LLM returned false answers about 20% of the time. That high failure rate does not serve a company’s goals. So to solve this problem and prevent generative AI from hallucinating, you can’t make it work in a vacuum. It is essential that the system is trained on high quality data to derive output, and that it is regularly checked by humans. Over time, these feedback loops can help correct errors and improve model accuracy.
It is imperative that the beautiful writing of generative AI be connected to a contextual, results-oriented system. The initial stage of any company’s system is the blank slate that incorporates information tailored to a company and its specific goals. The middle stage is the heart of a well-designed system, which includes rigorous LLM fine-tuning. OpenAI describes fine-tuning models as “a powerful technique for creating a new model specific to your use case.” This is done by using the normal approach and training models of generative AI on much more case-specific examples, yielding better results.
At this stage, companies have a choice between using a mix of hard-coded automation and sophisticated LLMs. While choreography can vary from company to company, making the most of each technology ensures the most contextual results.
After everything is set up at the back end, it’s time to let generative AI really shine in remote communications. Not only are answers made quickly and very accurately, they also provide a personal tone without suffering from empathy fatigue.
Orchestrate technology with human checkpoints
By orchestrating various technology levers, each company can provide the structured facts and context needed to allow LLMs to do what they do best. First, leaders need to identify tasks that are computationally intensive for humans but are easy to automate – and vice versa. Then think about where AI is better than both. Essentially, don’t use AI when a simpler solution, such as automation or even human effort, will suffice.
Speaking to OpenAI’s CEO Sam Altman at Stripe Sessions in San Francisco, Stripe’s founder John Collison said Stripe uses OpenAI’s GPT-4 “anywhere someone is doing manual work or working on a series of tasks.” Businesses should use automation to perform grunt work, such as gathering information and searching company-specific documents. They can also hardcode final, black-and-white mandates, such as return policies.
Only after setting up this strong foundation is it generative AI-ready. Because the input is highly curated before generative AI hits the information, systems are set up to handle more complexity with precision. Keeping people informed is still critical to verifying the accuracy of the model output, providing model feedback and correcting the results if necessary.
Measure results through transparency
At this point, LLMs are black boxes. In releasing GPT-4, OpenAI stated that “Given both the competitive landscape and the security implications of large-scale models such as GPT-4, this report does not provide further details about the architecture (including model size), hardware, training computers, dataset construction, training method or anything like that.” While some progress has been made to make models less opaque, how the model functions is still a bit of a mystery. Not only is it unclear what’s under the hood, it’s also unclear what the difference is between models – other than the cost and how you deal with it – because the industry as a whole lacks standardized efficacy measurements.
There are now companies that are changing this and bringing clarity to generative AI models. These standardizing measures of effectiveness have benefits for downstream companies. Companies like Gent race feed data back to customer feedback so everyone can see how well an LLM has performed for generative AI outputs. Other companies want Paperplane.ai take it a step further by capturing generative AI data and linking it to user feedback so leaders can evaluate implementation quality, speed and cost over time.
Liz Tsai is founder and CEO of HiOperator.
Data decision makers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
To read about advanced ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers