Preprint server arXiv will ban submitters of AI-generated hallucinations

May 15, 2026

One of the site’s moderators described the new policy on social media.

AI-generated slop has shown up everywhere, including in the peer-reviewed literature. Fake citations, unedited prompt responses, and nonsensical diagrams have all slipped past editors and peer reviewers, and it’s not always clear if there are any consequences for the people responsible.

Now, it appears that a number of scientific fields will be enforcing rules against AI-generated problems even before peer review or journals get involved. One of the people involved in the physics and astronomy preprint server arXiv used a social media thread to announce that any inappropriate AI-produced content submitted to the server will result in a one-year ban and a permanent requirement that future publications undergo peer review before the arXiv will host them.

Thomas Dietterich, in addition to being an emeritus professor at Oregon State University, is heavily involved with arXiv, serving on its editorial advisory council and on its moderation team. So he’s in a good position to understand the organization’s policies, although we have also reached out to arXiv leadership for confirmation, but have not yet received a response.

In a thread on X (also screenshotted on Bluesky, for those without X accounts), Dietterich described the new policy as arising directly from the arXiv’s moderation standards. “Submissions to arXiv must comply with appropriate standards of scholarly communication in form, including appropriate and carefully prepared sections, figures, tables, references, etc.,” those standards read. “General scrupulousness and care of preparation are required.”

Dietterich also notes that all authors of a manuscript are responsible for its content. So, if they carelessly submit material generated by an AI that violates these guidelines—Dietterich cites “inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content”—then they’re responsible, not the AI. Should violations be discovered, all of the manuscript’s listed authors will now receive a one-year submission ban, and any future manuscripts will only be accepted after they’ve been through peer review by a journal.

For fields that rely heavily on the arXiv, those are severe sanctions. Posting preprints in areas like astrophysics is widely considered part of the normal publication process, and scientists will often get feedback on preprints that helps them improve what they submit for peer review. The unfortunate problem is that, like most other things, the system can be gamed—people could submit flawed content that lists people as authors who have never been involved. Fortunately, its moderation system includes an appeal process.

One obvious question that arises when these problems are found in publications is why nobody caught them sooner. Now, we can at least know that someone is trying to.

Aurich

Here’s an OCR converted version of the text from that Bluesky screenshot for convenience:

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated.

If generative Al tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).

We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLMgeneration, this means we can’t trust anything in the paper.

The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer- reviewed venue.

Examples of incontrovertible evidence: hallucinated references, meta- comments from the LLM (“here is a 200 word summary; would you like me to make any changes?”; “the data in this table is illustrative, fill it in with the real numbers from your experiments*) end/

(I wasn’t about to dig up my X login to capture the original, but should be accurate.)