'Nature' Retracts Paper on the Benefits of ChatGPT in Education

Nature has retracted a paper that claimed AI had a positive impact on student learning.

The original paper, titled “The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis,” was originally published in May of last year by Jin Wang and Wenxiang Fan of the Hangzhou Normal University in China. It is a meta-analysis, meaning it combines data from 51 research studies published between November 2022 and February 2025 on the effectiveness of ChatGPT in education. The paper claimed it found that ChatGPT had a large or moderately positive impact on “students’ learning performance, learning perception, and higher-order thinking.”

“The Editor has decided to retract this paper owing to concerns regarding discrepancies in the meta-analysis,” Nature said in its retraction note. “These issues ultimately undermine the confidence the Editor can place in the validity of the analysis and resulting conclusions. The authors have not responded to correspondence regarding this retraction.”

The researchers did not immediately respond to a request for comment.

“I first noticed the paper published just a day or two after it came out on 6 May 2025,” Ben Williamson, a senior lecturer in digital education at the University of Edinburgh, told me in an email. “It rapidly picked up a lot of attention on social media, especially on LinkedIn, as it appeared to offer some of the first hard evidence that ChatGPT improves what the authors called ‘learning performance.’ Within a month it had been accessed online almost 400,000 times and had an Altmetric score of 365 after being shared hundreds of times on X and Bluesky. It was very much helped by some very influential individuals sharing it on social media as good evidence to support promoting AI in education.”

The retraction note did not provide more details on Nature's decision, but a 2025 study published in European Journal of Education Policy and Practice shows that the method Wang and Fan used is often flawed, and highlighted the issues in their paper before it was retracted.

“Existing empirical evidence on AIED [AI in educations] suggests some positive effects, but a closer look reveals methodological and conceptual problems and leads to the conclusion that existing evidence should not be used to guide policy or practice,” the paper, written Ilkka Tuomi and titled “What counts as evidence in AI & ED: Towards Science-for-Policy 3.0,” said.

One problem according to Tuomi is that these meta-analysis studies use any paper that was peer-reviewed, but that a closer look at each individual paper reveals that they vary in quality or that the data doesn’t show AI improves learning outcomes.

“Despite its apparent methodological quality and apparent rigour, the heterogeneity of the analysed studies makes the quantitative results of the Deng et al. meta-analysis meaningless,” Tuomi said, referring to another study about ChatGPT enhancing student learning. “Very similar problems underpin another viral article that has been interpreted to provide final proof that ChatGPT has positive impacts on learning. This study, by Wang and Fan (2025), uses the same methodology as the Deng et al. study, to the extent that it copies their search pattern with the original spelling mistakes. Already a quick review of the journals where the original studies have been published, show that low-quality and potentially predatory journals are included.”

“This meta analysis on ChatGPT effects on learning appeared only two and a half years after ChatGPT was launched,” Williamson said. “So what we are supposed to believe is that in the intervening period, dozens of high quality studies of the effect of ChatGPT on learning performance took place, were written up, submitted for peer review, and published, which the meta analysis authors then painstakingly synthesized using robust methods. What appeared actually to be the case is that the meta analysis aggregated a whole bunch of very low quality research published in disreputable journals. Ultimately, the meta analysis recycled junk science into headline-grabbing claims about the benefits of ChatGPT for learners. And those claims were simply unfounded due to methodological problems with the conduct of the study, as the retraction now appears to indicate.

"The retraction of this study should serve as a crucial reminder to the education community,” Jake Baskin, executive director of the Computer Science Teachers Association, told me in an email. “We need to teach students how this technology actually works, not just how to use it, and rigorously evaluate if and how generative AI genuinely improves teaching and learning."

Our reporting has repeatedly shown that large language models are prone to errors that can make education frustrating to both students and teachers. Multiple teachers have told us that ChatGPT has completely upended their ability to educate students and grade their work which is increasingly AI-generated. My investigation into Alpha School, the leading “AI-powered” school, used AI generated lesson plans that included errors and flawed questions.

Despite these problems, AI companies and lawmakers continue to push AI products into schools.

“ChatGPT and other generative AI applications have been incredibly disruptive in education for several years,” Williamson said. “What educators, parents and policy officials really needed was high quality data and evidence to help guide them. What they have had to deal with instead is some substandard research.”

About the author

Emanuel Maiberg is interested in little known communities and processes that shape technology, troublemakers, and petty beefs. Email him at emanuel@404media.co