Two pillars of encyclopedic knowledge, Britannica and Merriam-Webster, have filed a lawsuit against OpenAI. They allege massive use of their protected articles to train AI models like ChatGPT. The accusation goes beyond initial training, pointing out that generated responses can copy literal fragments. They also introduce a claim for damage to their brand when the AI makes errors.
RAG and real-time querying: where is the limit of inspiration? 🤔
The lawsuit points out a crucial technical nuance. Not only is the initial data scraping questioned, but also the operation of systems that use RAG (Retrieval-Augmented Generation). This technique queries external databases in real time to generate responses. For the plaintiffs, when ChatGPT uses this method and reproduces paragraphs from their works, a direct copy occurs, not a transformative process. This redefines the debate on infringement at the inference moment, not just training.
When the AI hallucinates and blames you 😅
The trademark law violation claim adds a picturesque twist. Britannica argues that its reputation for accuracy is tarnished when ChatGPT invents data or gives wrong answers that users may associate with the publisher. In other words, not only would they take the content without permission, but they would also hold them responsible for their own blunders. A case of undue appropriation of credibility, with tantrum rights included.