Research from a British university warns that scientific knowledge itself is under threat due to a flood of low-quality research papers generated by artificial intelligence (AI).
ADVERTISING
According to The Register, The University of Surrey research team notes an âexplosion of formulaic research articles,â including inadequate study designs and false findings, based on data extracted from the US national health database, the National Health and Nutrition Examination Survey (NHANES).
The study, published in PLOS Biology, a nonprofit publisher of open-access journals, found that many post-2021 articles used âa superficial and simplified approach to analysis.â These articles often focused on a single variable, ignoring more realistic, multifactorial explanations of the links between health conditions and potential causes, and selected subsets of data without justification.
âWeâve seen a rise in papers that look scientific but donât hold up to scrutiny â this is âscience fictionâ using national health datasets to masquerade as scientific fact,â says Matt Spick, professor of health and biomedical data analytics at the University of Surrey and one of the reportâs authors.
ADVERTISING
Reduced capacity for checking
âThe use of these easily accessible datasets via APIs, combined with large language models, is overwhelming some journals and reviewers, reducing their ability to evaluate more meaningful research â and ultimately weakening the quality of science as a whole,â he added.
The report notes that AI-ready datasets such as the NHANES from the United States, can open up new opportunities for data-driven research, but they also risk data exploitation by what he calls âpaper millsâ â entities that produce questionable scientific papers, often for paying customers seeking confirmation of an existing belief.
The University of Surrey work involved a systematic search of the literature over the past ten years to retrieve potentially formulaic articles covering NHANES data and analyze them to identify characteristic statistical approaches or study designs.
ADVERTISING
The team identified and retrieved 341 reports published in several different journals. They found that over the past three years, there has been a rapid increase in the number of publications analyzing single-factor associations between predictors (independent variables) and various health conditions using the NHANES dataset. An average of four articles per year were published between 2014 and 2021, rising to 33, 82, and 190 in 2022, 2023, and the first ten months of 2024, respectively.
According to The Register, a shift in the origins of published research was also observed. From 2014 to 2020, only two of 25 manuscripts had a lead author affiliated with China. Between 2021 and 2024, that number increased to 292 of 316 manuscripts.
Increased risk of misleading findings
The report states that this leap in single-factor associative research means there is a corresponding increase in the risk of introducing misleading findings into the broader body of scientific literature.
ADVERTISING
For example, it says that some well-known multifactorial health problems are analyzed as single-factor studies, citing depression, cardiovascular disease and cognitive function â all recognized as multifactorial â being investigated using simplistic single-factor approaches in some of the reviewed articles.
To combat this, the team puts forward several suggestions, including that journal editors and reviewers should consider single-factor analysis of conditions known to be complex and multifactorial as a âred flagâ for potentially problematic research.
Dataset providers should also take steps, including API keys and application numbers, to prevent data exploitation, an approach already used by the UK Biobank, the report says. Publications referencing such data should include an auditable account number as a condition of access.
ADVERTISING
Data analysis should be mandatory
Another suggestion is that full data set analysis should be mandatory unless the use of subsets of the data can be justified.
âWeâre not trying to block access to data or stop people from using AI in their research â weâre asking for some common-sense controls,â said Tulsi Suchak, a postgraduate researcher at the University of Surrey and lead author of the study. âThis includes things like being transparent about how data is used, ensuring reviewers with the right expertise are involved, and flagging when a study only looks at one part of the problem.â
This is not the first time the issue has come to light. Last year, US publisher Wiley discontinued 19 scientific journals overseen by its subsidiary Hindawi that were publishing reports produced by AI paper mills.
This is also part of a wider problem of AI-generated content appearing online and in web searches that can be difficult to distinguish from the real thing. Dubbed âAI junk,â this includes fake photos and entire video sequences of celebrities and world leaders, but also fake historical photographs and AI-generated portraits of historical figures appearing in search results as if they were genuine.