As a long-time gamer and avid reader with a deep appreciation for the creative works of authors, I find myself deeply concerned about the recent developments in the AI industry. The use of copyrighted material without proper licensing or compensation is a troubling trend that undermines the very foundation of creativity and innovation.
Over the last two years, I’ve noticed an exciting growth in a market where copyrighted content is being licensed for training AI systems. Pioneers like OpenAI have been leading the way by securing partnerships with media giants such as Axel Springer, News Corp., and the Associated Press. Other players in the field are picking up on this trend too.
Initially, there were no such agreements when AI companies initially encountered lawsuits alleging massive copyright infringement. Today, more lawsuits aim at challenging the legitimacy of this licensing market, suggesting that AI companies are unlawfully exploiting creators’ works by stealing them.
In a lawsuit filed on Monday night, authors claimed that Anthropic, with backing from Amazon, unlawfully obtained and duplicated their books to train its AI chatbot named Claude. The complaint asserts that Anthropic has been exploiting a licensing system intended for copyright holders.
As a technology enthusiast, I find myself closely watching the ongoing debate about the legality of incorporating copyrighted works into training datasets. If Congress doesn’t step in to clarify the issue, it seems that the courts will have the final say. The question hinges largely on fair use, a principle allowing the utilization of copyrighted materials for creating a new work, as long as this “transformative” work offers significant value beyond the original. This is one of the key fronts in our fight for mainstream technology adoption.
The authors’ lawsuits imply that AI companies argue their actions fall under a specific legal principle. By declining to grant licenses for content necessary for building Claude, Anthropic is undermining a pre-existing market that has been developed by other AI firms, according to the lawsuit.
As a fan, I can see how the accusations against Anthropic might be targeting their fair use defense, a point that was clarified in the Supreme Court case, Andy Warhol Foundation for the Visual Arts v. Goldsmith. In this instance, the court emphasized a balance between whether an allegedly infringing work has been significantly transformed and its commercial nature. The creators, including myself, are hoping to utilize this ruling to demonstrate that Anthropic could have licensed the material they used instead, thereby potentially harming our ability to profit from our own work by disrupting potential deals.
As a supporter, I’d rephrase it like this: “Rather than using pirated resources that are equivalent to today’s Napster for training materials, Anthropic could have acquired licenses to legally reproduce these copyrighted books. Instead, they chose to take shortcuts and resorted to using stolen content to train their models.”
If Anthropic didn’t stand accused of infringement, the authors suggest that widespread licensing could be facilitated through intermediaries such as clearinghouses, like the Copyright Clearance Center, which has just introduced a collective licensing system. In various lawsuits, record labels, publications, and other creators have made similar claims against AI companies.
The Writers Association is considering a plan that would allow its members to choose to grant an all-encompassing license to AI firms for the use of their works, primarily as teaching resources. Preliminary conversations have included the possibility of a charge for this usage and restrictions on AI-generated output that heavily resembles existing content.
“Mary Rasenberger, head of the organization, emphasized in January to The Hollywood Reporter that we must take a forward-thinking approach since generative AI isn’t going anywhere. She stated that the need for top-notch books remains, but it’s crucial that this technology is used lawfully and licensed appropriately.”
The class action lawsuit was submitted just prior to Tuesday’s announcement about OpenAI striking an agreement with Condé Nast for the integration of content from publications like Vogue, The New Yorker, and GQ within ChatGPT and a search tool prototype. This filing was done on behalf of authors Andrea Bartz (The Lost Night: A Novel, The Herd), Charles Graeber (The Good Nurse: A True Story of Medicine, Madness, and Murder), and Kirk Wallace Johnson (The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast). The lawsuit alleges copyright infringement and aims to represent other authors whose books were utilized as training data. It also seeks an injunction preventing further infringement.
The authors additionally contend that Anthropic is hindering authors’ book sales by assisting in the production of imitations. For example, when Kara Swisher published Burn Book earlier this year, Amazon reportedly received numerous AI-generated counterparts based on the complaint. In another case, author Jane Friedman found a collection of poor-quality books that had been written under her name.
The lawsuit claims that these deceitful individuals often rely on Claude for creating extensive content. It is alleged that a certain individual named Tim Boucher supposedly wrote 97 books using Claude, along with OpenAI’s ChatGPT, within just a year and sold them at prices ranging from $1.99 to $5.99. According to the complaint, it appears that Claude wouldn’t be capable of producing such long-form content if it wasn’t trained on a significant amount of books, books for which Anthropic did not compensate the original authors.
According to the legal action, the authors assert that Anthropic utilized a data collection called “The Pile,” which comprises approximately 200,000 books from an unofficial library source, in the training of Claude. In July, Anthropic acknowledged the employment of this dataset to numerous media outlets, as stated in the lawsuit.
Anthropic didn’t immediately respond to a request for comment.
https://www.scribd.com/embeds/760860990/content
Read More
Sorry. No data so far.
2024-08-20 22:25