Generative AI and Wikipedia editing: What we learned in 2025

2026-01-3121:14245122wikiedu.org

Show article

Like many organizations, Wiki Education has grappled with generative AI, its impacts, opportunities, and threats, for several years. As an organization that runs large-scale programs to bring new editors to Wikipedia (we’re responsible for about 19% of all new active editors on English Wikipedia), we have deep understanding of what challenges face new content contributors to Wikipedia — and how to support them to successfully edit. As many people have begun using generative AI chatbots like ChatGPT, Gemini, or Claude in their daily lives, it’s unsurprising that people will also consider using them to help draft contributions to Wikipedia. Since Wiki Education’s programs provide a cohort of content contributors whose work we can evaluate, we’ve looked into how our participants are using GenAI tools.

We are choosing to share our perspective through this blog post because we hope it will help inform discussions of GenAI-created content on Wikipedia. In an open environment like the Wikimedia movement, it’s important to share what you’ve learned. In this case, we believe our learnings can help Wikipedia editors who are trying to protect the integrity of content on the encyclopedia, Wikipedians who may be interested in using generative AI tools themselves, other program leaders globally who are trying to onboard new contributors who may be interested in using these tools, and the Wikimedia Foundation, whose product and technology team builds software to help support the development of high-quality content on Wikipedia.

Our fundamental conclusion about generative AI is: Wikipedia editors should never copy and paste the output from generative AI chatbots like ChatGPT into Wikipedia articles.

Let me explain more.

AI detection and investigation

Since the launch of ChatGPT in November 2022, we’ve been paying close attention to GenAI-created content, and how it relates to Wikipedia. We’ve spot-checked work of new editors from our programs, primarily focusing on citations to ensure they were real and not hallucinated. We experimented with tools ourselves, we led video sessions about GenAI for our program participants, and we closely tracked on-wiki policy discussions around GenAI. Currently, English Wikipedia prohibits the use of generative AI to create images or in talk page discussions, and recently adopted a guideline against using large language models to generate new articles.

As our Wiki Experts Brianda Felix and Ian Ramjohn worked with program participants throughout the first half of 2025, they found more and more text bearing the hallmarks of generative AI in article content, like bolded words or bulleted lists in odd places. But the use of generative AI wasn’t necessarily problematic, as long as the content was accurate. Wikipedia’s open editing process encourages stylistic revisions to factual text to better fit Wikipedia’s style.

But was the text factually accurate? This fundamental question led our Chief Technology Officer, Sage Ross, to investigate different generative AI detectors. He landed on a tool called Pangram, which we have found to be highly accurate for Wikipedia text. Sage generated a list of all the new articles created through our work since 2022, and ran them all through Pangram. A total of 178 out of the 3,078 articles came back as flagged for AI — none before the launch of ChatGPT in late 2022, with increasing percentages term over term since then. About half of our staff spent a month during summer 2025 painstakingly reviewing the text from these 178 articles.

Pangram's detection results showed no signs of AI usage before the launch of ChatGPT, and then a steady rise in usage in the terms following. Courtesy of Manoel Horta Ribeiro and Francesco Salvi. — Pangram’s detection results showed no signs of AI usage before the launch of ChatGPT, and then a steady rise in usage in the terms following. Courtesy of Manoel Horta Ribeiro and Francesco Salvi.

Based on the discourse around AI hallucinations, we were expecting these articles to contain citations to sources that didn’t exist, but this wasn’t true: only 7% of the articles had fake sources. The rest had information cited to real, relevant sources.

Far more insidious, however, was something else we discovered: More than two-thirds of these articles failed verification. That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.

This finding led us to invest significant staff time into cleaning up these articles — far more than these editors had likely spent creating them. Wiki Education’s core mission is to improve Wikipedia, and when we discover our program has unknowingly contributed to misinformation on Wikipedia, we are committed to cleaning it up. In the clean-up process, Wiki Education staff moved more recent work back to sandboxes, we stub-ified articles that passed notability but mostly failed verification, and we PRODed some articles that from our judgment weren’t salvageable. All these are ways of addressing Wikipedia articles with flaws in their content. (While there are many grumblings about Wikipedia’s deletion processes, we found several of the articles we PRODed due to their fully hallucinated GenAI content were then de-PRODed by other editors, showing the diversity of opinion about generative AI among the Wikipedia community.

Revising our guidance

Given what we found through our investigation into the work from prior terms, and given the increasing usage of generative AI, we wanted to proactively address generative AI usage within our programs. Thanks to in-kind support from our friends at Pangram, we began running our participants’ Wikipedia edits, including in their sandboxes, through Pangram nearly in real time. This is possible because of the Dashboard course management platform Sage built, which tracks edits and generates tickets for our Wiki Experts based on on-wiki edits.

We created a brand-new training module on Using generative AI tools with Wikipedia. This training emphasizes where participants could use generative AI tools in their work, and where they should not. The core message of these trainings is, do not copy and paste anything from a GenAI chatbot into Wikipedia.

We crafted a variety of automated emails to participants who Pangram detected were adding text created by generative AI chatbots. Sage also recorded some videos, since many young people are accustomed to learning via video rather than reading text. We also provided opportunities for engagement and conversation with program participants.

Our findings from the second half of 2025

In total, we had 1,406 AI edit alerts in the second half of 2025, although only 314 of these (or 22%) were in the article namespace on Wikipedia (meaning edits to live articles). In most cases, Pangram detected participants using GenAI in their sandboxes during early exercises, when we ask them to do things like choose an article, evaluate an article, create a bibliography, and outline their contribution.

This graph shows the daily total of Pangram's detected generative AI text our participants added to Wikipedia. Early in the term, the hits were primarily to exercises, with more sandbox and mainspace alerts later in the term. — This graph shows the daily total of Pangram’s detected generative AI text our participants added to Wikipedia. Early in the term, the hits were primarily to exercises, with more sandbox and mainspace alerts later in the term. CC BY-SA 4.0 — Wiki Education.

Pangram struggled with false positives in a few sandbox scenarios:

Bibliographies, which are often a combination of human-written prose (describing a source and its relevance) and non-prose text (the citation for a source, in some standard format)
Outlines with a high portion of non-prose content (such as bullet lists, section headers, text fragments, and so on)

We also had a handful of cases where sandboxes were flagged for AI after a participant copied an AI-written section from an existing article to use as a starting point to edit or to expand. (This isn’t a flaw of Pangram, but a reminder of how much AI-generated content editors outside our programs are adding to Wikipedia!)

In broad strokes, we found that Pangram is great at analyzing plain prose — the kind of sentences and paragraphs you’ll find in the body of a Wikipedia article — but sometimes it gets tripped up by formatting, markup, and non-prose text. Early on, we disabled alert emails for participants’ bibliography and outline exercises, and throughout the end of 2025, we refined the Dashboard’s preprocessing steps to extract the prose portions of revisions and convert them to plain text before sending them to Pangram.

Many participants also reported “just using Grammarly to copy edit.” In our experience, however, the smallest fixes done with Grammarly never trigger Pangram’s detection, but if you use its more advanced content creation features, the resulting text registers as being AI generated.

But overwhelmingly, we were pleased with Pangram’s results. Our early interventions with participants who were flagged as using generative AI for exercises that would not enter mainspace seemed to head off their future use of generative AI. We supported 6,357 new editors in fall 2025, and only 217 of them (or 3%) had multiple AI alerts. Only 5% of the participants we supported had mainspace AI alerts. That means thousands of participants successfully edited Wikipedia without using generative AI to draft their content.

For those who did add GenAI-drafted text, we ensured that the content was reverted. In fact, participants sometimes self-reverted once they received our email letting them know Pangram had detected their contributions as being AI created. Instructors also jumped in to revert, as did some Wikipedians who found the content on their own. Our ticketing system also alerted our Wiki Expert staff, who reverted the text as soon as they could.

While some instructors in our Wikipedia Student Program had concerns about AI detection, we had a lot of success focusing the conversation on the concept of verifiability. If the instructor as subject matter expert could attest the information was accurate, and they could find the specific facts in the sources they were cited to, we permitted text to come back to Wikipedia. However, the process of attempting to verify student-created work (which in many cases the students swore they’d written themselves) led many instructors to realize what we had found in our own assessment: In their current states, GenAI-powered chatbots cannot write factually accurate text for Wikipedia that is verifiable.

We believe our Pangram-based detection interventions led to fewer participants adding GenAI-created content to Wikipedia. Following the trend lines, we anticipated about 25% of participants to add GenAI content to Wikipedia articles; instead, it was only 5%, and our staff were able to revert all problematic content.

I’m deeply appreciative of everyone who made this success possible this term: Participants who followed our recommendations, Pangram who gave us access to their detection service, Wiki Education staff who did the heavy lift of working with all of the positive detections, and the Wikipedia community, some of whom got to the problematic work from our program participants before we did.

How can generative AI help?

So far, I’ve focused on the problems with generative AI-created content. But that’s not all these tools can do, and we did find some ways they were useful. Our training module encourages editors — if their institution’s policies permit it — to consider using generative AI tools for:

Identifying gaps in articles
Finding access to sources
Finding relevant sources

To evaluate the success of these use scenarios, we worked directly with 7 of the classes we supported in fall 2025 in our Wikipedia Student Program. We asked students to anonymously fill out a survey every time they used generative AI tools in their Wikipedia work. We asked what tool they used, what prompt they used, how they used the output, and whether they found it helpful. While some students filled the survey out multiple times, others filled it out once. We had 102 responses reporting usage at various stages in the project. Overwhelmingly, 87% of the responses who reported using generative AI said it was helpful for them in the task. The most popular tool by far was ChatGPT, with Grammarly as a distant second, and the others in the single-digits of usage.

Students reported AI tools very helpful in:

Identifying articles to work on that were relevant to the course they were taking
Highlighting gaps within existing articles, including missing sections or more recent information that was missing
Finding reliable sources that they hadn’t already located
Pointing to which database a certain journal article could be found
When prompted with the text they had drafted and the checklist of requirements, evaluating the draft against those requirements
Identifying categories they could add to the article they’d edited
Correcting grammar and spelling mistakes

Critically, no participants reported using AI tools to draft text for their assignments. One student reported: “I pasted all of my writing from my sandbox and said ‘Put this in a casual, less academic tone’ … I figured I’d try this but it didn’t sound like what I normally write and I didn’t feel that it captured what I was trying to get across so I scrapped it.”

While this was an informal research project, we received enough positive feedback from it to believe using ChatGPT and other tools can be helpful in the research stage if editors then critically evaluate the output they get, instead of blindly accepting it. Even participants who found AI helpful reported that they didn’t use everything it gave them, as some was irrelevant. Undoubtedly, it’s crucial to maintain the human thinking component throughout the process.

What does this all mean for Wiki Education?

My conclusion is that, at least as of now, generative AI-powered chatbots like ChatGPT should never be used to generate text for Wikipedia; too much of it will simply be unverifiable. Our staff would spend far more time attempting to verify facts in AI-generated articles than if we’d simply done the research and writing ourselves.

That being said, AI tools can be helpful in the research process, especially to help identify content gaps or sources, when used in conjunction with a human brain that carefully evaluates the information. Editors should never simply take a chatbot’s suggestion; instead, if they want to use a chatbot, they should use it as a brainstorm partner to help them think through their plans for an article.

To date, Wiki Education’s interventions as our program participants edit Wikipedia show promise for keeping unverifiable, GenAI-drafted content off Wikipedia. Based on our experiences in the fall term, we have high confidence in Pangram as a detector of AI content, at least in Wikipedia articles. We will continue our current strategy in 2026 (with more small adjustments to make the system as reliable as we can).

More generally, we found participants had less AI literacy than popular discourse might suggest. Because of this, we created a supplemental large language models training that we’ve offered as an optional module for all participants. Many participants indicated that they found our guidance regarding AI to be welcome and helpful as they attempt to navigate the new complexities created by AI tools.

We are also looking forward to more research on our work. A team of researchers — Francesco Salvi and Manoel Horta Ribeiro at Princeton University, Robert Cummings at the University of Mississippi, and Wiki Education’s Sage Ross — have been looking into Wiki Education’s Wikipedia Student Program editors’ use of generative AI over time. Preliminary results have backed up our anecdotal understanding, while also revealing nuances of how text produced by our students over time has changed with the introduction of GenAI chatbots. They also confirmed our belief in Pangram: After running student edits from 2015 up until the launch of ChatGPT through Pangram, without any date information involved, the team found Pangram correctly identified that it was all 100% human written. This research will continue into the spring, as the team explores ways of unpacking the effects of AI on different aspects of article quality.

And, of course, generative AI is a rapidly changing field. Just because these were our findings in 2025 doesn’t mean they will hold true throughout 2026. Wiki Education remains committed to monitoring, evaluating, iterating, and adapting as needed. Fundamentally, we are committed to ensuring we add high quality content to Wikipedia through our programs. And when we miss the mark, we are committed to cleaning up any damage.

What does this all mean for Wikipedia?

While I’ve focused this post on what Wiki Education has learned from working with our program participants, the lessons are extendable to others who are editing Wikipedia. Already, 10% of adults worldwide are using ChatGPT, and drafting text is one of the top use cases. As generative AI usage proliferates, its usage by well-meaning people to draft content for Wikipedia will as well. It’s unlikely that longtime, daily Wikipedia editors would add content copied and pasted from a GenAI chatbot without verifying all the information is in the sources it cites. But many casual Wikipedia contributors or new editors may unknowingly add bad content to Wikipedia when using a chatbot. After all, it provides what looks like accurate facts, cited to what are often real, relevant, reliable sources. Most edits we ended up reverting seemed acceptable with a cursory review; it was only after we attempted to verify the information that we understood the problems.

Because this unverifiable content often seems okay at first pass, it’s critical for Wikipedia editors to be equipped with tools like Pangram to more accurately detect when they should take a closer look at edits. Automating review of text for generative AI usage — as Wikipedians have done for copyright violation text for years — would help protect the integrity of Wikipedia content. In Wiki Education’s experience, Pangram is a tool that could provide accurate assessments of text for editors, and we would love to see a larger scale version of the tool we built to evaluate edits from our programs to be deployed across all edits on Wikipedia. Currently, editors can add a warning banner that highlights that the text might be LLM generated, but this is based solely on the assessment of the person adding the banner. Our experience suggests that judging by tone alone isn’t enough; instead, tools like Pangram can flag highly problematic information that should be reverted immediately but that might sound okay.

We’ve also found success in the training modules and support we’ve created for our program participants. Providing clear guidance — and the reason why that guidance exists — has been key in helping us head off poor usage of generative AI text. We encourage Wikipedians to consider revising guidance to new contributors in the welcome messages to emphasize the pitfalls of adding GenAI-drafted text. Software aimed at new contributors created by the Wikimedia Foundation should center starting with a list of sources and drawing information from them, using human intellect, instead of generative AI, to summarize information. Providing guidance upfront can help well-meaning contributors steer clear of bad GenAI-created text.

Wikipedia recently celebrated its 25th birthday. For it to survive into the future, it will need to adapt as technology around it changes. Wikipedia would be nothing without its corps of volunteer editors. The consensus-based decision-making model of Wikipedia means change doesn’t come quickly, but we hope this deep-dive will help spark a conversation about changes that are needed to protect Wikipedia into the future.

Read the original article

Comments

By crazygringo 2026-01-3123:088 reply

> That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not.

This has been a rampant problem on Wikipedia always. I can't seem to find any indicator that this has increased recently? Because they're only even investigating articles flagged as potentially AI. So what's the control baseline rate here?

Applying correct citations is actually really hard work, even when you know the material thoroughly. I just assume people write stuff they know from their field, then mostly look to add the minimum number of plausible citations after the fact, and then most people never check them, and everyone seems to just accept it's better than nothing. But I also suppose it depends on how niche the page is, and which field it's in.

By crabmusket 2026-01-3123:541 reply

There was a fun example of this that happened live during a recent episode of the Changelog[1]. The hosts noted that they were incorrectly described as being "from GitHub" with a link to an episode of their podcast which didn't substantiate that claim. Their guest fixed the citation as they recorded[2].

[1]: https://changelog.com/podcast/668#transcript-265

[2]: https://en.wikipedia.org/w/index.php?title=Eugen_Rochko&diff...

By chr15m 2026-02-012:281 reply

How did they know it was not LLM generated?

By michaelt 2026-02-0113:262 reply

The false claim was added 7 Nov 2022 [1] while chatgpt wasn't released until 30 Nov 2022.

[1] https://en.wikipedia.org/w/index.php?title=Eugen_Rochko&diff...

By mmcwilliams 2026-02-0116:14

Not that it's likely but there were publicly-released LLMs, like GPT-J, that were released in 2021.

By chr15m 2026-02-021:48

Thanks!

By gonzobonzo 2026-02-011:113 reply

The problems I've run into is both people giving fake citations (the citations don't actually justify the claim that's being made in the article), and people giving real citations, but if you dig into the source you realize it's coming from a crank.

It's a big blind spot among the editors as well. When this problem was brought up here in the past, with people saying that claims on Wikipedia shouldn't be believed unless people verify the sources themselves, several Wikipedia editors came in and said this wasn't a problem and Wikipedia was trustworthy.

It's hard to see it getting fixed when so many don't see it as an issue. And framing it as a non-issue misleads users about the accuracy of the site.

By mikkupikku 2026-02-0112:511 reply

A common source of error is in articles for movies where it gives plot summaries. The plot summaries are very often written by people who didn't watch the movie but are trying to re-resemble the plot like a jigsaw puzzle from little bits they glean from written reviews, or worse just writing down whatever they assume to be the plot. Very often it seems like the fuck ups came from people who either weren't watching the movie carefully, or were just listening to the dialogue while not watching the screen, or simply lacked media literacy.

Example [SPOILERS]: the page for the movie Sorcerer claims that rough terrain caused a tire to pop. The movie never says that, the movie shows the tire popping (which results in the trucks cargo detonating). The next scene reveals the cause, but only to those paying attention; the bloody corpse of a bandito laying next to a submachine gun is shown in the rubble beside the road, and more banditos are there, very upset and quite nervous, to hijack the second truck. The obvious inference is that the first truck's tire was shot by the bandit to hijack/rob the truck. The tire didn't pop from rough terrain, the movie never says it did, it's just a conclusion you could get from not paying attention to the movie.

By shmeeed 2026-02-0115:26

To me that sounds a bit like summaries made on the base of written movie scripts. A long time ago, I read a few scripts to movies I had never watched, and that's exactly the outcome: You get a rough idea what it's about and even get to recognise some memorable quotes, but there's little cohesion to it, for lack of all the important visual aspects and clues that tie it all together.

By Aurornis 2026-02-0116:33

> The problems I've run into is both people giving fake citations (the citations don't actually justify the claim that's being made in the article), and people giving real citations, but if you dig into the source you realize it's coming from a crank.

Citations have become heavily weaponized across a lot of spaces on the internet. There was a period of time where we all learned that citations were correlated with higher quality arguments and Wikipedia’s [Citation Needed] even became a meme.

But the quacks and the agenda pushers realized that during casual internet browsing readers won’t actually read, let alone scrutinize the citation links, so it didn’t matter what you linked to. As long as the domain and title looked relevant it would be assumed correct. Anyone who did read the links might take so much time that the comment section would be saturated with competing comments by the time someone can respond with a real critique.

This has become a real problem on HN, too. Often when I see a comment with a dozen footnoted citations from PubMed they’re either misunderstandings what the study says or some times they even say the opposite of what the commenter claims.

The strategy is to just quickly search PubMed or other sources for keywords and then copy those into the post with the HN footnote citation format, knowing that most people won’t read or question it.

By 6510 2026-02-017:451 reply

> but if you dig into the source you realize it's coming from a crank.

It is a dark sunday afternoon, Bob Park is sitting on his sofa as usual, drunk as usual, suddenly the TV reveals to him there to be something called the Paranormal (Twilight Zone music) ..instantly Bob knows there are no such things and adds a note to the incomprehensible mess of notes that one day will become his book. He downs one more Budweiser. In the distance lightning strikes a tree, Bob shouts You don't scare me! and shakes his fist. After a few more beers a miracle of inspiration descends and as if channeling, in the time span of 10 minutes he writes notes about Cold Fusion, Alternative Medicine, Faith Healing, Telepathy, Homeopathy, Parapsychology, Zener cards, the tooth fairy and father xmas. With much confidence he writes that non of them are real. It's been a really productive afternoon. It reminds him of times long gone back when he actually published many serious papers. He counts the remaining beers in his cooler and says to himself, in the next book I will need to take on god himself. The world needs to know, god is not real. I too will be the authority on that subject.

https://en.wikipedia.org/w/index.php?title=Special:WhatLinks...

By CPLX 2026-02-0115:191 reply

Curious what the point you're making here is. I don't know anything at all about Bob Park and whether he is a crank. But if you make your career doing the admirable work of debunking pseudo-science and nonsense theories, you would necessarily be linked to in discussions of those theories very, very frequently.

So maybe that's not a good description of him. But the link you posted is hardly dispositive.

By 6510 2026-02-0220:45

The pseudo science of corporate values is a contradiction in terms invented by HR ladies who drink tea for a living. People who believe such things also believe in aliens and have theories about vegetarian tigers.

You are now debunked.

(This comment is intentionally stupid, useless and the author knows nothing about the topic)

By chr15m 2026-02-012:192 reply

LLMs can add unsubstantiated conclusions at a far higher rate than humans working without LLMs.

By EA-3167 2026-02-012:261 reply

At some point you're forced to either believe that people have never heard of the concept of a force multiplier, or to return to Upton Sinclair's observation about getting people to believe in things that hurt their bottom line.

By DrewADesign 2026-02-012:391 reply

I don’t see why people keep blaming cars for road safety problems; people got into buggy crashes for centuries before automobiles even existed

By nullsanity 2026-02-012:571 reply

Because a difference in scale can become a difference in category. A handful of buggy crashes can be reduced to operator error, but as the car becomes widely adopted and analysis matures, it becomes clear that the fundamental design of the machine and its available use cases has fundamental flaws that cause a higher rate of operator error than desired. Therefore, cars are redesigned to be safer, laws and regulations are put in place, license systems are issued, and traffic calming and road design is considered.

Hope that helps you understand.

By DrewADesign 2026-02-013:572 reply

Is the sarcasm really that opaque? Who would unironically equate buggy accidents and automobile accidents?

By obidee2 2026-02-015:251 reply

I’d like to introduce you to the internet.

There’s a reason /s was a big thing, one persons obvious sarcasm is (almost tautologically) another persons true statement of opinion.

By DrewADesign 2026-02-021:251 reply

Thanks. I wasn’t aware of that.

By chr15m 2026-02-021:50

It took me a minute to realise you were joking too! :)

By forgetfreeman 2026-02-0110:321 reply

How much time have you spent around developers?

By DrewADesign 2026-02-021:21

I got my first tech job in 1998. Some of the most sarcastic people I’ve ever met.

By mikkupikku 2026-02-0112:402 reply

True, but humans got a 20 year head start and I am willing to wager the overwhelming majority of extant flagrant errors are due to humans making shit up and no other human noticing and correcting it.

My go too example was the SDI page saying that brilliant pebble interceptors were to be made out of tungsten (completely illogical hogwash that doesn't even pass a basic sniff test.) This claim was added to the page in February of 2012 by a new wikipedia user, with no edit note accompanying the change nor any change to the sources and references. It stayed in the article until October 29th, 2025. And of course this misinformation was copied by other people and you can still find it being quoted, uncited, in other online publications. With an established track record of fact checking this poor, I honestly think LLMs are just pissing into the ocean.

By asadotzler 2026-02-0113:531 reply

If LLMs 10X it, as the advocates keep insisting, that means it would only take 2 years to do as much or more damage as humans alone have done in 20.

By mikkupikku 2026-02-0114:041 reply

Perhaps so. On the other hand, there's probably a lot of low hanging fruit they can pick just by reading the article, reading the cited sources, and making corrections. Humans can do this, but rarely do because it's so tedious.

I don't know how it will turn out. I don't have very high hopes, but I'm not certain it will all get worse either.

By SiempreViernes 2026-02-0120:05

The entire point of the article is that LLMs cannot make accurate text, but ironically you claiming LLMs can do accurate texts illustrates your point about human reliability perfectly.

I guess the conclusion is there simply is no avenues to gain knowledge.

By busyant 2026-02-020:45

> I am willing to wager the overwhelming majority of extant flagrant errors are due to humans making shit up

In general, I agree, but I wouldn't want to ascribe malfeasance ("making shit up") as the dominant problem.

I've seen two types of problems with references.

1. The reference is dead, which means I can't verify or refute the statement in the Wikipedia article. If I see that, I simply remove both the assertion and the reference from the wiki article.

2. The reference is live, but it almost confirms the statement in the wikipedia article, but whoever put it there over-interpreted the information in the reference. In that case, I correct the statement in the article, but I keep the ref.

Those are the two types of reference errors that I've come across.

And, yes, I've come across these types of errors long before LLMs.

By mmooss 2026-02-010:013 reply

When I've checked Wikipedia citations I've found so much brazen deception - citations that obviously don't support the claim - that I don't have confidence in Wikipedia.

> Applying correct citations is actually really hard work, even when you know the material thoroughly.

Why do you find it hard? Scholarly references can be sources for fundamental claims, review articles are a big help too.

Also, I tend to add things to Wikipedia or other wikis when I come across something valuable rather than writing something and then trying to find a source (which also is problematic for other reasons). A good thing about crowd-sourcing is that you don't have to write the article all yourself or all at once; it can be very iterative and therefore efficient.

By crazygringo 2026-02-011:261 reply

It's not that I personally find it hard.

It's more like, a lot of stuff in Wikipedia articles is somewhat "general" knowledge in a given field, where it's not always exactly obvious how to cite it, because it's not something any specific person gets credit for "inventing". Like, if there's a particular theorem then sure you cite who came up with it, or the main graduate-level textbook it's taught in. But often it's just a particular technique or fact that just kind of "exists" in tons of places but there's no obvious single place to cite it from.

So it actually takes some work to find a good reference. Like you say, review articles can be a good source, survey articles or books. But it can take a surprising amount of effort to track down a place that actually says the exact thing. I literally just last week was helping a professor (leader in their field!) try to find a citation during peer review for their paper for an "obvious fact" in the field, that was in their introduction section. It was actually really challenging, like trying to produce a citation for "the sky is blue".

I remember, years ago, creating a Wikipedia article for a particular type of food in a particular country. You can buy it at literally every supermarket there. How the heck do you cite the food and facts about it? It just... is. Like... websites for manufacturers of the food aren't really citations. But nobody's describing the food in academic survey articles either. You're not going to link to Allrecipes. What do you do? It's not always obvious.

By Jepacor 2026-02-0115:361 reply

If you can buy the food at a supermarket, can't you cite a product page? Presumably that would include a description of the product. Or is that not good enough of a citation?

By crazygringo 2026-02-0120:21

Retail product listing URLs change constantly. They're not great.

And then you usually want to describe how the food is used. E.g. suppose it's a dessert that's mainly popular at children's birthday parties. Everybody in the country knows that. But where are you going to find something written that says that? Something that's not just a random personal blog, but an actual published valid source?

Ideally you can find some kind of travel guide or book for expats or something with a food section that happens to list it, but if it's not a "top" food highly visible to tourists, then good luck.

By efilife 2026-02-0120:04

I found several that were contradicting the claim they were supposed to support (in popular articles). I will never regain faith in wikipedia. Being an editor or just verifying information from wikipedia makes you hate it

By FranklinJabar 2026-02-012:01

[dead]

By jacquesm 2026-02-0111:411 reply

Linkrot is a problem and edited articles are another. Because you can cite all you want, but if the underlying resource changes your foundation just melted away.

By jayflux 2026-02-0113:06

Pretty much every citation added to wikipedia is passed on to web archive now, either by the editor or automatically later on.

For news articles especially the recommendation now is to use the archive snapshot and not the url of the page.

It’s not a perfect solution, but it tries to solve the link rot issue.

By shevy-java 2026-02-0114:04

> Applying correct citations is actually really hard work

Not disagreeing - many existing articles on wikipedia have barely any references or citation at all and in some cases wrong citation or wrong conclusions. Like when an article says water molecules behave oddly and then the wikipedia article concluding that water molecules behave properly.

By Wowfunhappy 2026-02-0117:161 reply

> This has been a rampant problem on Wikipedia always. I can't seem to find any indicator that this has increased recently? Because they're only even investigating articles flagged as potentially AI. So what's the control baseline rate here?

...y'know, I don't want to be that guy, but this actually seems like something AI could check for, and then flag for human review.

By bjourne 2026-02-0120:16

You don't need an LLM to find loads of uncorroborated claims on Wikipedia. See f.e. https://en.wikipedia.org/wiki/Variational_autoencoder Most articles about tech are woefully undersourced.

By ColinWright 2026-01-3121:153 reply

The title I've chosen here is carefully selected to highlight one of the main points. It comes (lightly edited for length) from this paragraph:

Far more insidious, however, was something else we discovered:

More than two-thirds of these articles failed verification.

That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.

By the_fall 2026-01-3123:041 reply

FWIW, this is a fairly common problem on Wikipedia in political articles, predating AI. I encourage you to give it a try and verify some citations. A lot of them turn out to be more or less bogus.

I'm not saying that AI isn't making it worse, but bad-faith editing is commonplace when it comes to hot-button topics.

By mjburgess 2026-01-3123:513 reply

Any articles where newspapers are the main source are basically just propaganda. An encyclopaedia should not be in the business of laundering yellow journalism into what is supposed to be a tertiary resource. If they banned this practice, that would immediately deal with this issue.

By mmooss 2026-02-010:101 reply

A blanket dimsissal is a simple way to avoid dealing with complexity, here both in understanding the problem and forming solutions. Obviously not all newspapers are propaganda and at the same time not all can be trusted; not everything in the same newspaper or any other news source is of the same accuracy; nothing is completely trustworthy or completely untrustworthy.

I think accepting that gets us to the starting line. Then we need to apply a lot of critical thought to sometimes difficult judgments.

IMHO quality newspapers do an excellent job - generally better than any other category of source on current affairs, but far from perfect. I remember a recent article for which they intervied over 100 people, got ahold of secret documents, read thousands of pages, consulted experts .... That's not a blog post or Twitter take, or even a HN comment :), but we still need to examine it critically to find the value and the flaws.

By abacadaba 2026-02-011:282 reply

> Obviously not all newspapers are propaganda

citation needed

By tbossanova 2026-02-012:21

There is literally no source without bias. You just need to consider whether you think a sources biases are reasonable or not

By troyvit 2026-02-015:22

See you should work for a newspaper. You have the gumption.

By the_fall 2026-02-010:16

That's not what I'm saying. I mean citations that aren't citations: a "source" that doesn't discuss the topic at all or makes a different claim.

By snigsnog 2026-02-010:53

That is probably 95% of wikipedia articles. Their goal is to create a record of what journalists consider to be true.

By dang 2026-01-3122:37

Submitted title was "For most flagged articles, nearly every cited sentence failed verification".

I agree, that's interesting, and you've aptly expressed it in your comment here.

By chr15m 2026-02-012:231 reply

People here are claiming that this is true of humans as well. Apart from the fact that bad content can be generated much faster with LLMs, what's your feeling about that criticism? It's there any measure of how many submissions before LLMs make unsubstantiated claims?

Thank you for publishing this work. Very useful reminder to verify sources ourselves!

By alok-g 2026-02-0223:04

I have indeed seen that with humans as well, including in conference papers and medical journals. The reference citations in papers is seen by many authors as another section they need to fill to get their articles accepted, not as a natural byproduct of writing an article.

By wry_durian 2026-01-3123:283 reply

Note that this article is only about edits made through the Wiki Edu program, which partners with universities and academics to have students edit Wikipedia on course-related topics. It's not about Wikipedia writ large!

By Jepacor 2026-02-0116:07

Ah, so when you force students to edit Wikipedia for their courses, you get worse results than someone editing something voluntarily because they're passionate about it. That's... Hardly surprising.

So it's more about how generative AI is a problem in college right now because lazy students are using it to do the work than about Wikipedia itself, I think.

By tovej 2026-02-018:04

I've found Wiki Edu -edited pages with pages of creative writing exercises. When I have read their sources they were clumsily paraphrasing and misunderstanding the source.

LLMs definitely fit the use-case of Wiki Edu students, who are just looking to pass a grade, not to look into a topic because of their interest.

By ketzu 2026-02-018:40

That's interesting as my first thought reading the comments was "this problem seems very similar to many students writing papers just finding citations that sound correct".

Sometimes it is really sad to read from (even PhD level) students on social media about their paper writing practices.