Lawyers for The New York Times and Daily News claim that OpenAI inadvertently deleted crucial data related to their copyright lawsuit against the company regarding unauthorized use of their content, according to a TechCrunch report. The incident occurred after OpenAI agreed to provide access to its training datasets to aid the plaintiffs in verifying the usage of their copyrighted materials.
The lawsuit alleges that OpenAI has scraped articles from The New York Times and Daily News without obtaining permission to train its models. In response to the suit, OpenAI provided two virtual machines for the publishers’ attorneys to search its training data for their copyrighted content. Since November 1, the legal teams have dedicated more than 150 hours to this search. However, on November 14, OpenAI engineers mistakenly erased all search data stored on one of the virtual machines, as noted in a filing made in the U.S. District Court for the Southern District of New York.
OpenAI’s attempts to recover the deleted data were mostly successful, but the loss of the folder structure and file names rendered the recovered data unusable in tracking where the plaintiffs’ articles were included in the AI’s training. The letter filed by the plaintiffs’ counsel emphasized that they had to reconstruct their work, consuming extensive resources and time.
Despite the deletion of data, the counsel clarified that there is no indication the incident was intentional. They expressed concern that OpenAI is ideally positioned to search its own datasets, indicating an obligation to assist in the investigation of potential copyright infringement.
OpenAI just made macOS smarter with ChatGPT app support
OpenAI contends that using publicly available data for training its models falls under “fair use.” The company maintains that it does not need to license or compensate for these contents, even as it profits from its AI products. Nonetheless, OpenAI has entered into licensing agreements with several publishers, including prominent names like the Associated Press and Financial Times. While the specific terms of these deals remain undisclosed, it is reported that Dotdash, one of the partners, receives at least $16 million annually.
OpenAI has yet to issue a statement addressing the incident or its implications for its relationship with the plaintiffs.
Featured image credit: Jonathan Kemper/Unsplash