...

Galanwe

3357

Karma

2014-01-09

Created

Recent Activity

  • Okay I found the trick, buried in the benchmark setup of s5cmd.

    The claimed numbers are _not_ reached with S3, but rather from a custom server emulating the S3 API, hosted on the client machine.

    I think this is very misleading, since these benchmark numbers are not reachable in any real life scenario. It also shows that there is very little point in using s5cmd compared to other tools, since beyond 1.6GB/s the throttling will be from S3, not from the client, so any tool able to saturate 1.6GB/s will be enough.

  • > The still use Czech Republic for Czechia

    Naive question but is Czechia a new name? The UN lists "Czech Republic" as official name and "Czechia" as the short name.

    > Republic of Ireland for Ireland

    To be fair, I don't think this is partisan, but rather just a way to differentiate the state from the island.

  • I know that already... and it is exactly what I tested and confirmed here https://news.ycombinator.com/item?id=44249137

    You can spawn multiple connections to S3 to retrieve chunks of a file in parallel, but each of these connections is capped at 80MB/s, and the whole of these connections, while operating on a single file, to a single EC2 instance, is capped at 1.6GB/s.

  • I just spawned a r6a.16xlarge with a 25gbps NIC, created a 10GB file which I uploaded to an S3 bucket in the same region, through a local S3 VPC endpoint.

    Downloading that 10GB file to /dev/shm with s5cmd took 24s, all while spawning 20 or so threads which were all idling for IO.

    The same test using a Python tool (http://github.com/NewbiZ/s3pd) with the same amount of workers took 10s.

    Cranking up the worker count of the latter library until there is no more speedup, I can reach 6s with 80 workers. That is, 10/6 = 1.6GB/s, which seems to confirm my previous comment.

    What am I doing wrong ?

  • > For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s)

    I'm surprised by these claims. I have worked pretty intimately with S3 for almost 10 years now, developed high performance tools to retrieve data from it, as well as used dedicated third party tools for performant file download tailored for S3.

    My experience is that individual S3 connections are capped over the board at ~80MB/s, and the throughput of 1 file is capped at 1.6GB/s (at least per ec2 instance). At least I have never managed myself nor seen any tool capable of going beyond that.

    My understanding is then that this benchmark's claims of 4.3GB/s are across multiple files, but then it would be rather meaningless, as it's free concurrency basically.

HackerNews