As a devoted fan and screenwriter who’s spent countless hours crafting stories that resonate with audiences, this recent revelation about AI training data sets has left me both outraged and disheartened. It feels like the industry is stealing from us, the very creators they claim to celebrate. I remember pouring my heart and soul into scripts for shows like Breaking Bad, The Sopranos, and The Wire, only to find out that those same words are being used to train machines that could one day replace us.
It turns out that the writers’ demands during the strike didn’t bring about the results everyone had anticipated. Alex Reisner, a screenwriter and programmer, came across an interesting discovery while scrutinizing a large dataset used to train AI for various language models. In a piece he wrote for The Atlantic, Reisner revealed that this dataset was trained on over 53,000 movie scripts and 85,000 TV episode scripts, including works from iconic shows like The Godfather, The Simpsons, Twin Peaks, The Sopranos, and Breaking Bad.
According to Reisner, the AI-training dataset utilized by tech giants such as Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and others, incorporates writing from all Best Picture-nominated films dating back to 1950 and continuing through 2016. This extensive database not only encompasses scripts of every episode of series like “The Wire”, but it also includes dialogue prepared in advance for broadcasts such as the Golden Globes and Academy Awards. Essentially, this AI has access to a vast array of written material, leaving little unseen by the machine.
Following Reisner’s revelation about the numerous texts utilized for training Language Learning Models (LLMs), writers and media enthusiasts across the board expressed outrage. Intrigued, some fans and scriptwriters delved deeper to understand the extensive resources these LLMs are built upon. Indeed, there is a significant amount of data they need to work with.
Writers Are Furious About AI Stealing Their Work
While AI-generated images for movies may be perceived as inappropriate by many viewers, it pales in comparison to the controversy surrounding using writer’s scripts to train AI to write stories similar to screenwriters. Alex Reisner developed a search engine that allows Atlantic subscribers to delve into this data set themselves. Those who explored found that no one was immune to the AI’s data collection.
Numerous authors are taken aback and disgusted upon discovering that their previous creations have been employed to develop something they dread might supplant them in the coming years. David Slack, a writer for “Teen Titans”, expressed his anger when he discovered 42 of his scripts in the database, including those for “Person of Interest,” “Lie to Me,” and “In Plain Sight.
I’m furious beyond belief. I’m absolutely enraged. It’s appalling. It represents an immense effort on my part… These are projects that I’ve put all my passion and energy into.” – David Slack (paraphrased)
Day by day, writers face exploitation in the entertainment sector with minimal or no royalties for their published works. This disrespect has reached an unprecedented peak, a grave insult that will not be easily forgotten by either writers or audiences. The recent overstep by LLM training is a clear reminder of the work that remains to safeguard the industry from AI intrusion.
You’ll likely discover the database search tool right over here, where it’s quite possible to unearth your preferred media item.
Read More
Sorry. No data so far.
2024-11-27 00:01