Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey – are suing Open AI And meta each in a U.S. District Court over duplicate claims of copyright infringement.
Among other things, the lawsuits allege that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally obtained datasets containing their works, which they say were obtained from “shadow library” websites such as Bibliotik, Library Genesis, Z-Library and others, noting the books are “available in bulk via torrent systems.”
Golden and Kadrey each declined to comment on the lawsuit, while Silverman’s team failed to respond in time.
In the OpenAI suit, the trio offers exhibitions showing that ChatGPT, when asked, summarizes their books and infringes on their copyrights. from Silverman Bedwetter is the first book shown to be summarized in the exhibits by ChatGPT, while Golden’s book Ararat is also used as an example, just like Kadrey’s book Sandman Slim. The claim says the chatbot never bothered to “reproduce the copyright management information that plaintiffs included with their published works.”
As for the separate lawsuit against Meta, it alleges the authors’ books were accessible in datasets used by Meta to train its LLaMA models, a quartet of open-source AI models the company introduced in February.
The complaint explains in steps why the plaintiffs believe the datasets have an illegal origin – in a Metapaper detailing LLaMA, the company references sources for its training datasets, one of which is called ThePile, which is compiled by a company called EleutherAI. ThePile, the complaint alleges, was described in a EleutherAI paper as composed of “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” mentioned, the lawsuit says, are “blatantly illegal.”
In both claims, the authors say they have “not authorized the use of their copyrighted books as training materials” for the companies’ AI models. Their lawsuits each contain six counts of various types of copyright violations, negligence, unjust enrichment and unfair competition. The authors are seeking legal damages, refund of profits and more.
Attorneys Joseph Saveri and Matthew Butterick, representing the three authors, write to their LLMlitigation website that they have heard from “writers, authors and publishers who are concerned about [ChatGPT’s] uncanny ability to generate text similar to that found in copyrighted text materials, including thousands of books.
Saveri has also launched a lawsuit against AI companies on behalf of programmers and artists. Getty Images also filed an AI lawsuit, alleging that Stability AI, which created the AI image generation tool Stable Diffusion, trained its model on “millions of copyrighted images.” Saveri and Butterick also represent authors Mona Awad and Paul Tremblay in a similar case via the company’s chatbot.
Lawsuits like this aren’t just headaches for OpenAI and other AI companies; they challenge the boundaries of copyright. There is as we said on The Vergecast any time someone puts Nilay to work on copyright, we’re going to see lawsuits around this sort of thing for years to come.
We’ve reached out to Meta, OpenAI, and the law firm Joseph Saveri for comment, but they didn’t respond in time.