Ask HN: How to stop an AWS bot sending 2B requests/month?

2025-10-175:28288181

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a ...

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

I've submitted several complaints to AWS to get this traffic to stop, their typical followup is: We have engaged with our customer, and based on this engagement have determined that the reported activity does not require further action from AWS at this time.

I've tried various 4XX responses to see if the bot will back off, I've tried 30X redirects (which it follows) to no avail.

The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

I've considered redirecting the entirety of the traffic to aws abuse report page, but at this scall, it's essentially a small DDoS network and sending it anywhere could be considered abuse in itself.

Are there others that have similar experience?


Comments

  • By AdamJacobMuller 2025-10-1716:385 reply

    > I've tried 30X redirects (which it follows)

    301 response to a selection of very large files hosted by companies you don't like.

    When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.

    Hard to do with cloudflare but you can also tar pit them. Accept the request and send a response, one character at a time (make sure you uncork and flush buffers/etc), with a 30 second delay between characters.

    700 requests/second with say 10Kb headers/response. Sure is a shame your server is so slow.

    • By notatoad 2025-10-1717:052 reply

      >301 response to a selection of very large files hosted by companies you don't like.

      i suggest amazon

      • By lgats 2025-10-1719:46

        unfortunately, it seems AWS even has firewalls that will quickly start failing these requests after a few thousand, then they're back up to their high-concurrency rate

      • By knowitnone3 2025-10-180:49

        Microsoft

    • By gitgud 2025-10-1721:461 reply

      > Accept the request and send a response, one character at a time

      Sounds like the opposite of the [1] Slow Loris DDOS attack. Instead of attacking with slow connections, you’re defending with slow connections

      [1] https://www.cloudflare.com/en-au/learning/ddos/ddos-attack-t...

      • By tliltocatl 2025-10-181:351 reply

        That's why it is actually sometimes called inverse slow loris.

        • By amy_petrik 2025-10-183:18

          it's called the slow sirol in my circles

    • By tremon 2025-10-180:051 reply

      As an alternative: 301 redirect to an official .sg government site, let local law enforcement deal with it.

      • By integralid 2025-10-1812:501 reply

        Don't actually do this, unless you fancy meeting AWS lawyers in court and love explaining intricate details of HTTP to judges.

        • By more_corn 2025-10-1817:211 reply

          I like this idea. Here’s how it plays out: Singapore law enforcement gets involved. They send a nasty-gram to AWS. lawyers get involved. AWS lawyers collect facts. Find that the culprit is not you, find that you’ve asked for help, find that they (AWS) failed to remediate, properly fix responsibility on the culprit and secondary responsibility on themselves, punch themselves in the crotch for a minute, and then solve the problem by canceling the account of the offending party.

          • By kadoban 2025-10-1818:191 reply

            > Find that the culprit is not you, find that you’ve asked for help, find that they (AWS) failed to remediate, properly fix responsibility on the culprit and secondary responsibility on themselves, punch themselves in the crotch for a minute, and then solve the problem by canceling the account of the offending party.

            Yeah, lawyers are notorious for blaming themselves and taking responsibility. You definitely won't just get blamed.

            • By anakaine 2025-10-2010:42

              A lawyer who can see an easy defence to a path they wish to pursue is going to consider that in their response. If thay defence looks like their own clients vulnerability would be exposed in defence because of their clients action or inaction, their first response will almost certainly be to get the client to fix that action or inaction.

    • By more_corn 2025-10-1817:13

      ^ I love you

    • By gruez 2025-10-1720:092 reply

      >When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.

      Inbound traffic is free for AWS

      • By jacquesm 2025-10-1722:18

        It's free, but it's not infinite.

      • By kadoban 2025-10-1821:44

        Free just means you get in trouble when you abuse it.

  • By swiftcoder 2025-10-176:597 reply

    Making the obviously-abusive bot prohibitively expensive is one way to go, if you control the terminating server.

    gzip bomb is good if the bot happens to be vulnerable, but even just slowing down their connection rate is often sufficient - waiting just 10 seconds before responding with your 404 is going to consume ~7,000 ports on their box, which should be enough to crash most linux processes (nginx + mod-http-echo is a really easy way to set this up)

    • By gildas 2025-10-178:511 reply

      Great idea, some people have already implemented it for the same type of need, it would seem (see the list of user agents in the source code). Implementation seems simple.

      https://github.com/0x48piraj/gz-bomb/blob/master/gz-bomb-ser...

      • By kijin 2025-10-1723:44

        Be careful using this if you're behind cloudflare. You might inadvertently bomb your closest ally in the battle.

    • By mkj 2025-10-178:152 reply

      AWS customers have to pay for outbound traffic. Is there a way to get them to send you (or cloudflare) huge volumes of traffic?

      • By horseradish7k 2025-10-178:271 reply

        yeah, could use a free worker

        • By compootr 2025-10-1723:12

          free workers only get 100k reqs per day or something

      • By _pdp_ 2025-10-178:291 reply

        A KB zip file can expand to giga / petabytes through recursive nesting - though it depends on their implementation.

        • By sim7c00 2025-10-179:591 reply

          thats traffic in the other direction

          • By swiftcoder 2025-10-1713:271 reply

            The main joy of a zip bomb is that it doesn't consume much bandwidth - the transferred compressed file is relatively small, and it only becomes huge when the client tries to decompress it in memory afterwards

            • By crazygringo 2025-10-1717:321 reply

              It's still going in the wrong direction.

              • By dns_snek 2025-10-1717:452 reply

                It doesn't matter either way. OP was thinking about ways to consume someone's bandwidth. A zip bomb doesn't consume bandwidth, it consumes computing resources of its recipient when they try to unpack it.

                • By sim7c00 2025-10-188:572 reply

                  i wouldnt assume someone sending 700 req per minute or so to a single domain repeatedly (likely to the same resources) will bother opening zip files.

                  the bot in the article is likely being tested (as author noted), or its a very bad 'stresser'.

                  if it was looking for content grabbing it will access differently. (grab resources once and be on its way).

                  its not bad to host zip bombs tho, for the content grabbers :D nomnom.

                  saw an article about a guy on here who generated arbitrary pngs or so. also classy haha.

                  if u have a friendly vps provider who gives unlimited bandwidth these options can be fun. u can make a dashboard which bot has consumed the most junk.

                  • By mjmas 2025-10-1812:07

                    This is using the builtin compression in http:

                      Transfer-Encoding: gzip

                  • By ruined 2025-10-189:561 reply

                    nearly every http response is gzipped. unpacking automatically is a default feature of every http client.

                    • By sim7c00 2025-10-1912:06

                      Accept-Encoding i think would be logical on scrapers these days but maybe its not helpful idk. server should adhere to what client requests afaik.

                • By crazygringo 2025-10-1718:20

                  I know. I was pointing out that it doesn't matter what it consumes if it's going the wrong way to begin with.

    • By CWuestefeld 2025-10-1717:321 reply

      We've been a similar situation. One thing we considered doing is to give them bad data.

      It was pretty clear in our case that they were scraping our site to get our pricing data. Our master catalog had several million SKUs, priced dynamically based on availability, customer contracts, and other factors. And we tried to add some value to the product pages, with relevant recommendations for cross-sells, alternate choices, etc. This was pretty compute-intensive, and the volume of the scraping could amount to a DoS at times. Like, they could bury us in bursts of requests so quickly that our infrastructure couldn't spin up new virtual servers, and once we were buried, it was difficult to dig back out from under the load. We learned a lot during this period, including some very counterintuitive stuff about how some approaches to queuing and prioritizing that appeared sounded great on paper, actually could have unintended effects that made such situations worse.

      One strategy we talked about was that, rather than blocking the bad guys, we'd tag the incoming traffic. We couldn't do this perfect accuracy, but the inaccuracy was such that we could at least ensure that it wasn't affecting real customers (because we could always know when it was a real, logged-in user). We realized that we could at least cache the data in the borderline cases so we wouldn't have to recalculate (it was a particularly stupid bot that was attacking us, re-requesting the same stuff many times over); from that it was a small step to see that we could at the same time add a random fudge factor into any numbers, hoping to get to a state where the data did our attacker more harm than good.

      We wound up doing what the OP is now doing, working with CloudFlare to identify and mitigate "attacks" as rapidly as possible. But there's no doubt that it cost us a LOT, in terms of developer time, payments to CF, and customer dissatisfaction.

      By the way, this was all the more frustrating because we had circumstantial evidence that the attacker was a service contracted by one of our competitors. And if they'd come straight to us to talk about it, we'd have been much happier (and I think they would have been as well) to offer an API through which they could get the catalog data easily and in a way where we don't have to spend all the compute on the value-added stuff we were doing for humans. But of course they'd never come to us, or even admit it if asked, so we were stuck. And while this was going, there was also a case in the courts that was discussed many times here on HN. It was a question about blocking access to public sites, and the consensus here was something like "if you're going to have a site on the web, then it's up to you to ensure that you can support any requests, and if you can't find a way to withstand DoS-level traffic, it's your own fault for having a bad design". So it's interesting today to see that attitudes have changed.

      • By gwbas1c 2025-10-1718:411 reply

        > rather than blocking the bad guys, we'd tag the incoming traffic

        > had circumstantial evidence that the attacker was a service contracted by one of our competitors

        > we'd have been much happier ... to offer an API through which they could get the catalog data easily

        Why not feed them bad data?

        • By CWuestefeld 2025-10-1813:502 reply

          We didn't like the ethics of it, especially since we couldn't guarantee that the bogus data was going only to the attacker (rather than to innocent but not-yet-authenticated "general public").

          • By IshKebab 2025-10-1817:35

            I guess you could have required login to show prices to suspicious requests. Then it shouldn't affect most people and if it accidentally does the worst outcome is they need to log in.

          • By miga 2025-10-2017:461 reply

            Do they change IP numbers so often?

            • By CWuestefeld 2025-10-2023:42

              Oh, lord yes! Frequently they're scraping us from multiple distinct CIDR blocks simultaneously. But we can tell it's the same organization doing it not just because the requests look similar, but it's even possible occasionally to see a request for a search from one CIDR that's followed up immediately by requests for details for the products that had been returned by the search.

              While at the same time, because our site is B2B ecommerce, where our typical customer is a decent-sized corporation, it's not uncommon for a single legit user to have consecutive requests originate from different IPs, as their internal proxies use different egress points.

    • By kristianp 2025-10-1723:003 reply

      Stupid question, won't that consume 7000 ports on your own box as well?

      • By kijin 2025-10-1723:40

        Each TCP connection requires a unique combination of (server port, client port). Your server port is fixed: 80 or 443. They need to use a new ephemeral port for each connection.

        You will have 7000 sockets (file descriptors), but that's much more manageable than 7000 ports.

      • By Neywiny 2025-10-1723:281 reply

        I think it'll eat 7000 connection objects, maybe threads, but they'll all be on port 80 or 443? So if you can keep the overhead of each connection down, presumably easy because you don't need it to be fast, it'll be fine

      • By swiftcoder 2025-10-1814:58

        7000 sockets, at any rate, but provided you've anticipated the need, this isn't challenging to support (and nginx is very good at handling large numbers of open sockets)

    • By Orochikaku 2025-10-177:302 reply

      Thinking along the same lines a PoW check like like anubis[1] may work for OP as well.

      [1] https://github.com/TecharoHQ/anubis

      • By hshdhdhehd 2025-10-179:022 reply

        Avoid if you dont have to. It is not really good traffic friendly. Especially if current blocking works.

        • By CaptainOfCoit 2025-10-1717:561 reply

          > Especially if current blocking works.

          The submission and the context is when current blocking doesn't work...

          • By hshdhdhehd 2025-10-185:201 reply

            > Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

            That is strictly less resource intensive than serving 200 and some challenge.

            • By CaptainOfCoit 2025-10-1811:38

              Right, but if you re-read the submission, OP already tried that and found the costs to be potentially be too high, and are looking for alternatives...

      • By winnie_ua 2025-10-1812:58

        It was blocking me from accessing GNOME's gitlab instance from my cell phone.

        So it mistakedly flagged me as bot. IDK. And it forces legitimate users to wait a while. Not great UX.

    • By lagosfractal42 2025-10-177:494 reply

      This kind of reasoning assumes the bot continues to be non-stealthy

      • By lucastech 2025-10-1721:06

        Yeah, there are some botnets I've been seeing that are much more stealthy, using 900-3000 IP's with rotating user agents to send enormous amounts of traffic.

        I've resorted to blocking entire AS routes to prevent it (fortunately I am mostly hosting US sites with US only residential audiences). I'm not sure who's behind it, but one of the later data centers is oxylabs, so they're probably involved somehow.

        https://wxp.io/blog/the-bots-that-keep-on-giving

      • By swiftcoder 2025-10-178:001 reply

        I mean, forcing them to spend engineering effort the make their bot stealthy (or to be able to maintains 10's of thousands of open ports), is still driving up their costs, so I'd count it as a win. The OP doesn't say why the bot is hitting their endpoints, but I doubt the bot is a profit centre for the operator.

        • By lagosfractal42 2025-10-1712:071 reply

          You risk flagging real users as bots, which drives down your profits and reputation

          • By swiftcoder 2025-10-1712:261 reply

            In this case I don't think they do - unless the legitimate users are also hitting your site at 700 RPS (in which case, the added load from the bot is going to be negligible)

            • By hansvm 2025-10-1821:35

              Once the bot is stealthy (the current sub-thread if I haven't misread) they absolutely do. A couple examples where I've been flagged as a bot for normal traffic:

              1. Discord's telemetry was broken on my browser, and on failure they immediately retried. It didn't take many actions queued up on the site before my browser was initiating over 100RPS, on their behalf.

              2. Target and eBay still flag my sessions as bot traffic (presumably because they don't recognize the user agent or because I use Linux or something). Target allows browsing their site for a few items before heavily rate-limiting me for a day or so, and eBay just resets my password a day or two after I log in, every single bloody time.

              The problem is that from time to time normal users will generate large traffic volumes, and if the bot owner uses many IPs then you're forced to use less reliable signals for that ban hammer (i.e., no single user will be near 700 RPS).

      • By somat 2025-10-1717:231 reply

        xkcd 810 comes to mind. https://xkcd.com/810/

        "what if we make the bots go stealthy and indistinguishable from actual human requests?"

        "Mission Accomplished"

        • By HPsquared 2025-10-1720:211 reply

          This has pretty much happened now in the internet at large, and it's kinda sad.

          • By lotsofpulp 2025-10-181:24

            “Constructive” and “Helpful” are unfortunately not out weighed by garbage.

      • By heavyset_go 2025-10-1716:47

        If going stealth means not blatantly DDoS'ing the OP then that's a better outcome than what's currently happening

    • By SergeAx 2025-10-1817:14

      Wouldn't it consume the same number of connections on my server?

  • By neya 2025-10-177:511 reply

    I had this issue on one of my personal sites. It was a blog I used to write maybe 7-8 years ago. All of a sudden, I see insane traffic spikes in analytics. I thought some article went viral, but realized it was too robotic to be true. And so I narrowed it down to some developer trying to test their bot/crawler on my site. I tried asking nicely, several times, over several months.

    I was so pissed off that I setup a redirect rule for it to send them over to random porn sites. That actually stopped it.

    • By sim7c00 2025-10-1710:002 reply

      this is the best approach honestly. redirect them to some place that undermines their efforts. either back to themselves, their own provider, or nasty crap that no one want to find in their crawler logs.

      • By specialist 2025-10-1718:371 reply

        Maybe someone will publish a "nastylist" for redirecting bots.

        Decades later, I'm still traumatized by goatse, so it'll have to be someone with more fortitude than me.

        • By sim7c00 2025-10-188:39

          goatse, lemonparty, meatspin. take ur pick of the gross but clearnetable things.

          mind you before google and the likes and the great purge of internet, these things were mild and humorous...

      • By throwaway422432 2025-10-1710:343 reply

        Goatse?

        Wouldn't recommend Googling it. You either know or just take a guess.

        • By Rendello 2025-10-1717:142 reply

          I googled a lot of shock sites after seeing them referenced and not knowing what they were. Luckily Google and Wikipedia tended to shield my innocent eyes while explaining what I should be seeing.

          The first goatse I actually saw was in ASCII form, funnily enough.

          • By antonymoose 2025-10-1721:06

            I use the ASCII form to reply to spammers, since it will not trip up on an attachment filter or anything most usually. I get mixed results from them, but the results are usually funny.

          • By sph 2025-10-208:39

            I've never seen it in ASCII form, and I don't want to search for it as google will inevitably disregard my instructions and show me the 4K version in full color.

        • By nosrepa 2025-10-1720:07

          The Jason Scott method.

HackerNews