A recent study from the Tow Center for Digital Journalism highlights significant concerns about the accuracy and reliability of citations generated by ChatGPT, particularly for publishers whose content is used by OpenAI’s chatbot. As publishers increasingly engage in content licensing deals with OpenAI, the findings raise questions about how the AI tool handles sourcing, with some suggesting it could undermine the reputation of publishers and their content.
Study Overview: Inaccurate and Misleading Citations
Conducted by Columbia Journalism School, the study examined how ChatGPT handles citations when asked to identify the source of quotes taken from 20 different publishers, including major names like The New York Times, The Washington Post, and The Financial Times. The researchers tested the chatbot by providing sample quotations from 10 articles per publisher, ensuring that the sources were easily traceable through search engines like Google and Bing.
Despite OpenAI’s claims that ChatGPT can deliver “timely answers with links to relevant web sources,” the study found that the chatbot consistently returned incorrect or fabricated citations. In fact, of the 200 quotes tested, the researchers found 153 instances of entirely or partially incorrect citations. Shockingly, ChatGPT rarely acknowledged its mistakes, only doing so in 7 cases, where it used qualifiers like “appears” or “I couldn’t locate the exact article.”
The Reliability of ChatGPT’s Citations
The study’s findings suggest that ChatGPT’s citations are highly unreliable. While some citations were correct, the majority were either entirely wrong or partially inaccurate. The chatbot appeared confident in its answers, even when it was incorrect, leading to potential confusion and misinformation for users who rely on ChatGPT for sourcing.
One of the most concerning aspects of the study was that, when asked to cite content from publishers who had blocked OpenAI’s crawlers, ChatGPT still attempted to generate citations — often resorting to fabrication when it couldn’t access the information. This reliance on “confabulation” rather than admitting a lack of information raises serious concerns about the quality and authenticity of data used by ChatGPT.
The Risk to Publishers’ Reputation and Commercial Interests
For publishers, the implications of these inaccurate citations are profound. If ChatGPT incorrectly attributes content, it could not only damage a publisher’s reputation but also divert traffic away from their site. Publishers who have partnered with OpenAI in licensing deals might have hoped for more accurate sourcing, but the study found that even these arrangements did not guarantee reliable citations. This means that publishers may still be at risk of being misrepresented, regardless of whether they’ve licensed content to OpenAI or blocked its crawlers entirely.
Additionally, the study uncovered instances where ChatGPT incorrectly attributed content to plagiarized sources. In one case, the bot sourced a plagiarized version of a New York Times article from another website, illustrating the difficulty OpenAI faces in filtering out low-quality or plagiarized content from its datasets.
Decontextualization and Inconsistency in Responses
Another critical issue raised by the researchers is that ChatGPT often treats journalism as “decontextualized content,” with little regard for the original circumstances under which it was produced. This problem is compounded by the inconsistency in the bot’s responses — asking the same question multiple times often led to different answers. Such variability in results is particularly problematic in the context of citations, where consistency and accuracy are paramount.
While the study was limited in scope and acknowledges the need for further testing, its findings are concerning for both publishers and users who rely on ChatGPT for accurate sourcing. The study suggests that OpenAI’s technology has yet to offer consistent, high-quality citations, even for publishers who have entered into formal licensing agreements.
OpenAI’s Response and Publisher Concerns
In response to the findings, OpenAI defended its approach, stating that the researchers conducted an “atypical test” of the product. The company emphasized its support for publishers, highlighting that it helps 250 million weekly ChatGPT users discover quality content. OpenAI also mentioned ongoing efforts to improve citation accuracy and respect publisher preferences, such as managing how content appears in search results through tools like OAI-SearchBot in the robots.txt file.
However, the researchers argue that, despite these claims, the current state of citation handling by ChatGPT leaves publishers with limited control over how their content is represented. The study concludes that publishers have “little meaningful agency” when it comes to how their work is used or misrepresented by the AI chatbot.
Conclusion: The Path Forward for Publishers and OpenAI
This study underscores the growing challenges that publishers face as they navigate the complexities of content licensing and the use of generative AI tools like ChatGPT. While AI offers vast potential for enhancing content discovery, it also introduces significant risks, particularly regarding accuracy and attribution. For publishers, the study serves as a cautionary tale, highlighting the importance of maintaining control over how their content is sourced and cited in AI-driven platforms.
As OpenAI continues to refine its technology, the issue of citation accuracy will likely remain a critical point of concern for publishers. If OpenAI cannot offer more reliable and transparent citation practices, publishers may reconsider their partnerships or take further measures to protect their content from misrepresentation.