The future of AI could hinge on a thorny legal issue.

Source: Nieman Lab; The Washington Post

-A lawsuit accuses OpenAI and Microsoft of violating the New York Times' copyright. But the law is far from clear.

By Will Oremus and Elahe Izadi

 

If a media outlet copied a bunch of stories from the New York Times and published them on its site, it would probably be seen as a blatant violation of the Times' copyright.

But what happens when a tech company copies those same articles, combines them with countless other copied works, and uses them to train an AI chatbot capable of conversing on almost any topic, including those it met at the Times?

That's the legal question at the heart of a lawsuit the Times filed against OpenAI and Microsoft in federal court last week, alleging the tech companies illegally used "millions" of copyrighted Times articles to help develop the AI models behind tools like ChatGPT and Bing. It is the latest, and some believe the strongest, in a series of active lawsuits alleging that various technology and AI companies have violated the intellectual property of media companies, photography sites, book authors and artists.

Taken together, the cases have the potential to shake the foundations of the burgeoning generative AI industry, some legal experts say, but they could also fail. That's because the tech companies are likely to rely heavily on a legal concept that has served them well in the past: the doctrine known as "fair use."

Broadly speaking, copyright law distinguishes between word-for-word copying of someone else's work (which is generally illegal) and "remixing" it, or putting it to new and creative use. What's confusing about AI systems, said James Grimmelmann, professor of digital and information law at Cornell University, is that in this case they appear to be doing both.

Generative AI represents "this great technological transformation that can create a remixed version of anything," Grimmelmann said. "The challenge is that these models can also blatantly memorize works they were trained on and often produce near-exact copies," which, he said, is "traditionally the core of what copyright law prohibits."

From early VCRs, which could be used to record TV shows and movies, to Google Books, which digitized millions of books, U.S. companies have convinced courts that their technological tools amounted to fair use of copyrighted works. OpenAI and Microsoft are already mounting a similar defense.

"We believe that the training of AI models qualifies as fair use, which is entirely consistent with established precedent recognizing that the use of copyrighted materials by technological innovators in a transformative manner is entirely consistent with copyright law," OpenAI wrote in a document filed with the U.S. Copyright Office in November.

AI systems are often "trained" with gigantic datasets that include vast amounts of published material, much of which is copyrighted. Through this training, they come to recognize patterns in the arrangement of words and pixels, which they can then use to assemble plausible prose and images in response to almost any prompt.

Read more here

Publicaciones Similares