Galanwe

2025-06-12 12:10

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

Okay I found the trick, buried in the benchmark setup of s5cmd.

The claimed numbers are _not_ reached with S3, but rather from a custom server emulating the S3 API, hosted on the client machine.

I think this is very misleading, since these benchmark numbers are not reachable in any real life scenario. It also shows that there is very little point in using s5cmd compared to other tools, since beyond 1.6GB/s the throttling will be from S3, not from the client, so any tool able to saturate 1.6GB/s will be enough.

2025-06-12 8:48

Commented: "The hunt for Marie Curie's radioactive fingerprints in Paris"

> The still use Czech Republic for Czechia

Naive question but is Czechia a new name? The UN lists "Czech Republic" as official name and "Czechia" as the short name.

> Republic of Ireland for Ireland

To be fair, I don't think this is partisan, but rather just a way to differentiate the state from the island.

2025-06-11 4:38

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

I know that already... and it is exactly what I tested and confirmed here https://news.ycombinator.com/item?id=44249137

You can spawn multiple connections to S3 to retrieve chunks of a file in parallel, but each of these connections is capped at 80MB/s, and the whole of these connections, while operating on a single file, to a single EC2 instance, is capped at 1.6GB/s.

2025-06-11 4:16

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

I just spawned a r6a.16xlarge with a 25gbps NIC, created a 10GB file which I uploaded to an S3 bucket in the same region, through a local S3 VPC endpoint.

Downloading that 10GB file to /dev/shm with s5cmd took 24s, all while spawning 20 or so threads which were all idling for IO.

The same test using a Python tool (http://github.com/NewbiZ/s3pd) with the same amount of workers took 10s.

Cranking up the worker count of the latter library until there is no more speedup, I can reach 6s with 80 workers. That is, 10/6 = 1.6GB/s, which seems to confirm my previous comment.

What am I doing wrong ?

2025-06-11 2:53

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

> For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s)

I'm surprised by these claims. I have worked pretty intimately with S3 for almost 10 years now, developed high performance tools to retrieve data from it, as well as used dedicated third party tools for performant file download tailored for S3.

My experience is that individual S3 connections are capped over the board at ~80MB/s, and the throughput of 1 file is capped at 1.6GB/s (at least per ec2 instance). At least I have never managed myself nor seen any tool capable of going beyond that.

My understanding is then that this benchmark's claims of 4.3GB/s are across multiple files, but then it would be rather meaningless, as it's free concurrency basically.

Hacker News

Galanwe

3357

2014-01-09

Recent Activity

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

Commented: "The hunt for Marie Curie's radioactive fingerprints in Paris"

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

Commented: "S5cmd: Parallel S3 and local filesystem execution tool"

HackerNews