AI uBlock Blacklist

2026-02-218:10295130github.com

Websites I personally found that are completely generated by AI. Pull requests welcome. - alvi-se/ai-ublock-blacklist

A personal list for uBlock Origin blocking AI content farms. Pull requests welcome.

You can click here to subscribe to this list automatically. This link works only if you have uBlock Origin installed.

Alternatively, import the following URL as a 3rd party list in uBlock Origin.

  • https://raw.githubusercontent.com/alvi-se/ai-ublock-blacklist/master/list.txt

While browsing it happens sometimes most of the times that I come across websites which text is written by generative AI. These websites provide no useful information, have mediocre content and are filled up with ads and referral links to earn money. So when I find this kind of website, I put it here.

The key idea is simple: if I wanted my question to be answered by AI, I would ask AI. If I'm searching online, it means that I want an answer written by a person. A person has experience, opinions, ideas, creativity and a lot more information that might want to share with the web. AI doesn't.

What's more, AI content can also be dangerous: articles in content farms are not checked by anyone before being published, since they are massively generated. An AI might hallucinate when writing about dangerous topics: it might suggest you to short-circuit on your circuit board. Or to execute dangerous commands on your PC such as rm -rf /. Or to mix bleach and ammonia (DON'T DO THESE THINGS). AI generated content is not reliable, and if no one is checking what is being published, it needs to be blocked.

As I said, I'm adding pages as I browse, so each entry is added manually. I'm not considering using automated tools simply because it's hard for an algorithm to understand if a page is AI generated or not, especially with the guidelines I wrote next. An argument could be that this list is useless because it's too short. However, since these websites are doing SEO to appear first on search engines, you will meet the same website more than once, especially for searches related to each other. I've found this list to be blocking websites since the very first day I began writing it, even with very few entries.

However, there is indeed some bias for my entries. For example, as I am an Italian citizen, you will find a lot of Italian websites. This is another reason why pull requests are welcome.

If you're not a technical user and don't know how GitHub works, simply report your suspects creating an issue, by clicking here.

If you want to create a pull request, here's how to add a website to the list. First, try to find the scope of the AI spammer. Usually, it will be a domain, but I've found also a lot of Medium blogs or dev.to blogs. These platforms should not be blocked as a whole, but just the blog who's spamming.

Say that you want to add entry example.com/@slopUser, simply put a line to file list.txt as following:

||example.com/@slopUser^$doc

The whole example.com domain hosts AI garbage, you say? Add only the domain:

If you really hate AI and have a lot of time to spend, you are welcome to do some research about the website you have found. Most of these content farms are built by people or organizations who sell SEO and digital marketing. If you find the source, you might also find other content farms they have created. If you do, add them at the bottom of the file.

Content farm have have some patterns that make them recognizable. Here are the ones I understood. Of course, these are not strict rules.

  • Unnecessary introduction and conclusion: this is probably the easiest way to spot content farms. Often, the intro is also annoyingly baroque. For example, you click on some guide on how to use a specific feature in Flutter. The article would begin giving an introduction about Flutter itself like the following:

    In today’s fast-moving digital landscape, users expect apps to be fast, beautiful, and consistent across every device they touch.

    Probably, no human would write such an introduction to explain a specific feature. It could fit to generally speak about Flutter (still too baroque though), but not for a super specific code that can be understood only by experienced developers. If I'm writing such an article, I'm supposing that the reader will already know what Flutter is!

  • [topic]: A Comprehensive Guide/A Step-To-Step Guide/Ultimate Guide: LLMs love these catchphrases for titles of tutorials and guides.

  • No/few links to external content

  • No sources and references: same as above, but for sources about facts. This is very important to check, especially for articles about (pseudo)science, politics and content that could spread misinformation.

  • Referral links everywhere: content farms are there just to make money. I've personally seen sites shamelessly putting purchase advice in navbars or footers.

  • Reference to product of company: the website is owned by a company that sells some product or service. The page will probably be like "How to solve [problem]" -> buy our product.

  • Blog with hundreds of thousands of articles: especially when published in a very short time span. A lot of AI slop blogs post tens or hundreds of articles a day, mostly by the same author.

  • Hallucinations: the content is plain wrong.

  • Date after November 2022: the AI hype began with the release of ChatGPT in November 2022. A weak guideline for sure, but it adds up to all others. Dates can be easily faked though.

  • No/few images, videos, non-text media: these pages are automatically generated and published, and it's hard to generate other kind of content to be put into the page.

  • AI generated images, logos: usually it's the banner of the article or the blog logo.

  • Poor text formatting

  • Not-rendered Markdown characters: text has no formatting and has Markdown syntax.

  • Long post, with unnecessary or out-of-context content: at some point the article can start talking about another topic, which can be related to the original one but irrelevant.

  • Always on top of search engines result: they are of course abusing SEO.

  • Know-it-all blog: the same blog appears in the search engine for completely different topics.

  • Vague content: lots of headings, each with a short content that provides no useful information.

  • Unprofessional or missing contact information: they have bought a domain, why are they using Gmail for contact?

  • Vague or missing about page

  • AI enthusiastic content: if someone loves ChatGPT they will use it for everything. So if you stumble across an AI-devoted blog, it's for sure 100% AI generated.

Because AI users are dumb enough to copy-paste LLM responses without reading them, the LLM will sometimes reveal itself, for example in the intro of the content. The following are some Google Dorks to find 100% AI generated pages. Such pages will have their whole domain put into the uBlock list. Dorks can be easily generated by asking an LLM to generate naive content, for example by prompting Generate an article for my blog about dogs. The LLM will answer with an intro like Sure! Here's an article about. Take this phrase and search it surrounded by quotes.

# English
"Sure! Here's an article about"

# Italian
"Certo! Ecco un articolo"

My website is on your list!

Cry about it.

But I'm just using AI to correct grammar and spell

All I hear is skill issue. Imagine needing an AI to write stuff.

I bought a domain that previously was a content farm, can you remove it?

No.

  • uBlockOrigin & uBlacklist Huge AI Blocklist: this projects hides every AI related result from search engines, including websites that I believe are legit tools (e.g. ChatGPT). What I want instead is blocking access only to garbage AI content farms.

Read the original article

Comments

  • By quiet35 2026-02-2116:4510 reply

    I like the idea and even considered contributing to the list, but this stopped me:

    > NAQ (Never Asked Questions)

    > My website is on your list!

    > Cry about it.

    That's quite a suspicious attitude. Clearly the maintainer believes he is infallible. I understand the emotions behind this, but this is not how a public blacklist should be maintained.

    • By TonyTrapp 2026-02-2116:503 reply

      Yuuup. My personal website has been inaccessible to a few friends, they thought my server was down. It turned out they had some blocklist (not related to AI) installed on their PiHole, and for whatever reason my website was on that list. It is, in fact, to this day, because my request to unblock it went completely unanswered. I still don't know why the website is on the list.

      • By jorvi 2026-02-2117:451 reply

        Go to the Adguard GitHub (or use the extension) and report it. And get all your friends to switch to Adguard extension and Adguard Home (Pi Hole alternative) as blockers.

        Easylist and its sublist are notorious for being poorly maintained and ignoring issues opened against it. Adguard is much more active in maintaining its lists. Especially Adguard its language blocklists have much, much less breakage and missed ads than Easylist.

        • By skeeter2020 2026-02-2122:291 reply

          >> And get all your friends to switch to Adguard extension and Adguard Home (Pi Hole alternative) as blockers.

          Nice of you to slip this "easy" step into your advice. Give me a break!

          • By jorvi 2026-02-2216:411 reply

            ..?

            If you know how to run a Pi Hole, you know how to run Adguard Home. And installing Chromium / Firefox / Safari extensions isn't exactly rocket science.

            • By KellyCriterion 2026-02-238:51

              The crux is in the sentence of yours:

              >...all your friends to switch to ...<

              :-))

      • By VladVladikoff 2026-02-2117:091 reply

        Perhaps it got hacked and was hosting malware without you being aware? They are pretty good at hiding it from the site owner (showing the original website to you, but not to others).

        • By TonyTrapp 2026-02-2117:12

          The server is and has been clean the whole time. I don't even run WordPress or anything similar on that server that would be a common hacking target. If it was hacked, I'm pretty sure Google Safe Browsing would be the first to flag the site, not some random PiHole list.

      • By zadikian 2026-02-2221:28

        PiHole should err on the side of false negatives, uBO on false positives. Difference being uBO only takes a click to disarm for a site.

        Personally I don't want to introduce any chance of my DNS being a problem.

    • By Drupon 2026-02-2118:561 reply

      Probably because there's about the same chance of them being innocent as the "Help I was wrongfully banned by VAC :(((" posts in the Counterstrike community.

      • By matheusmoreira 2026-02-2119:371 reply

        Reminder that false positives are not only possible but likely. I remember one instance where you could get people banned by sending them a specific string of characters over chat. Anticheat was scanning the entire contents of RAM looking for it.

        These days anticheat software is likely to snap at anything. Who knows what they think of the development tools Hacker News users are likely to have on their computers? They really hate virtual machines for example. There's no telling how they'd react to a debugger or profiler.

        • By Drupon 2026-02-2121:362 reply

          Yeah that's what the people love to say on the Steam forums when they've gotten busted in one of its many ban waves.

          • By s0ss 2026-02-222:08

            Both of you can be right.

          • By matheusmoreira 2026-02-2322:541 reply

            Never claimed otherwise. Just saying it's a fact of life that every test has false positives and false negatives. The "is this player cheating" test is no exception.

            Check out this amazing episode:

            https://news.ycombinator.com/item?id=26296339

            Dude got so fed up with long loading times he debugged the game and not only discovered the cause but actually fixed it. Billion dollar corporation couldn't be assed to do it.

            Gotta wonder if this guy wouldn't have gotten banned by the anticheat for having the audacity to hook into the game with a debugger or something. Only cheaters do that sort of thing right?

            • By Drupon 2026-02-247:391 reply

              Yeah that would be a reasonable thing to ban for. Companies can't afford to audit every single unauthorized tampering with their software to ensure that it's benign. If it results in better cheat detection, far better to have a policy that's unobstrusive and non-applicable to 99.99% of users and something the marginal outliers will understand is a risk.

              • By matheusmoreira 2026-02-263:55

                It's a false positive. He would get banned even though he was not cheating.

                Whether it's "reasonable" or not comes down to politics. Optimizing for either false positives or false negatives is a policy decision. Do you punish innocents to ensure you catch every single cheater? Do you let cheaters go to ensure you don't punish innocents?

                I don't really intend to discuss the above questions. I'm just pointing out the fact one of those so called complainers in the forums could very well turn out to be one of these false positives. That's what the system is optimized for.

    • By the_biot 2026-02-2118:141 reply

      I would add that with this attitude and how new this initiative is, there's very little chance it will still be updated 5 years from now. Really this sort of thing needs to come from Easylist or similar, who have a track record of maintaining these for years.

      • By Larrikin 2026-02-2119:352 reply

        I don't understand the need for the author to commit the rest of his life to this or start a foundation. It is a good list for now and if its never updated again, that seems fine.

        • By jjcob 2026-02-228:06

          If a blocklist doesn't get updated it is outdated in a week.

          Some tools are useful without updates. A blocklist for AI content farms that are sprouting like crazy is not helpful if it isn't updated.

        • By skeeter2020 2026-02-2122:301 reply

          in that case they should just contribute to one of the existing, more established lists. We don't need n+1 standards...

          • By Larrikin 2026-02-2123:44

            Which lists is open to this kind of contribution?

    • By DrammBA 2026-02-2120:00

      You forgot:

      > A personal list for uBlock Origin

    • By wasmainiac 2026-02-226:41

      Fork it then!

    • By ycombinatrix 2026-02-2121:24

      If the website is not AI slop, presumably they would remove it from the list.

    • By GaryBluto 2026-02-221:46

      [flagged]

    • By well_ackshually 2026-02-2119:562 reply

      [flagged]

      • By JamesLeonis 2026-02-2122:25

        I agree.

        I find it a bit ironic that this site regularly talks about banning whole countries and IP ranges on our servers, then acts shocked when users do the same. The fact that somebody went to the effort to create and share this shows how poorly the public sees the web.

        The reality we face is "Check your AdBlocker" is the new "Check your spam folder" and we should adjust accordingly.

      • By tokenless 2026-02-2123:28

        Problem is if this becomes popular and people being lazy assume blocked site means slop without checking, then the repo has a lot of power to break innocent sites.

        I don't have an answer because as you say with power comes people wrangling over power. And claw sloperators can be way more persistent!

    • By NeutralCrane 2026-02-2117:48

      Also seems a bit hypocritical given the screed about how such a list is necessary because the AI content might output hallucinations or damaging content without review.

      But if it’s the author’s blocklist that is wrong, unverified, and causing harm to others? Cry about it.

  • By rdmuser 2026-02-218:104 reply

    A new more grounded list focused on specifically blocking content farms and similar low quality sites.

    A nice alternative to this very broad anti ai list: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist

    Edit: Oh I should mention I found it through reddit and there is some good discussion there where they describe how they find stuff etc: https://www.reddit.com/r/uBlockOrigin/comments/1r9uo3j/autom...

    • By Dwedit 2026-02-2114:583 reply

      The broad list seems to just be a hater list. It's not trying to cover cases of deception (passing off AI material as if it's something else), as it includes sites which are very open about what kind of content is on there.

      • By malfist 2026-02-2117:531 reply

        Would you say the same about a block list that blocks anything else? I don't care how obvious an ad is, I don't want to see it. Same with social widgets or cookie consent banners, or newsletter sign-ups.

        But I wouldn't call the person that maintains the news letter popup block list as "newsletter hater"

        • By gruez 2026-02-2121:451 reply

          >Would you say the same about a block list that blocks anything else? I don't care how obvious an ad is, I don't want to see it. Same with social widgets or cookie consent banners, or newsletter sign-ups.

          He's not complaining that widgets for his favorite social network site is getting blocked, he's complaining that anything vaguely related to social networks are getting banned. Some of the sites on that list are stuff like chatgpt.com, which might be AI related, but clearly doesn't fit the criteria of "AI generated content, for the purposes of cleaning image search engines".

          • By malfist 2026-02-2313:50

            Its an AI block list. Not an "AI generated content, for the purposes of cleaning image search engines" block list.

      • By hogwasher 2026-02-2118:05

        The purpose of the broad list is removing AI-generated content from search results, so that the user doesn't have to wade through (as much) slop to find the human-created content they're looking for.

        While I applaud the honesty of sites that are open about their content being AI generated, that type of content is never what I'm looking for when I search, so if they're in my search results it's just more distraction/clutter drowning out whatever I'm actually looking for. Blocking them improves my search experience slightly, even though there is of course still lots of other unwanted results remaining.

        Granted, I definitely count as an AI hater (speaking of LLM's specifically). But even if I weren't, I don't think I'd be seeking it out specifically using a search engine; why would I do that when I could just go straight to chatgpt or whatever myself? Search is usually where people go to find real human answers (which is why appending "reddit" to one's searches became so common). So I see this as a utility thing, more than a "I am blocking all this just because I hate it" thing. Although it can be both, certainly.

        Edit: removed an off-topic tangent

      • By lawtalkinghuman 2026-02-2211:50

        If my goal is not seeing AI slop, I don't particularly care whether it is honestly labelled or not.

    • By smusamashah 2026-02-2116:28

      So there is a spreadsheet of websites. That is very interesting. There was an article here sometime ago about a media group who have so many super SEOd websites. They all have common footer text. I searched and added as many as I could find in uBlacklist. I have a gist listing them and how I searched for them. You might find that useful.

      Edit: https://gist.github.com/SMUsamaShah/6573b27441d99a0a0c792431...

    • By xnx 2026-02-2113:141 reply

      Hasn't been updated in 5 months

      • By rdmuser 2026-02-2113:27

        Oh good point I also overlooked that with the anti ai list.

        The big anti ai list also seems to be focused on hiding links from ddg/bing/google where this new more focused list just blocks sites. I tend to like block ones vs hiding because they pop up a nice warning no matter where I came from and I can still decide to ignore it if I want so they is more user agency instead of just quietly hiding a unclear chunk of the net from search engines.

    • By tkel 2026-02-224:09

      Thanks, I added both lists

  • By throwatdem12311 2026-02-2116:101 reply

    Ublock Origin also already has an “AI widget” blocklist you can enable. Literally the only extension that keeps me on Firefox because of how useless it is on Chromium.

HackerNews