When imperfect systems are good: Bluesky's lossy timelines

2025-02-1917:48785304jazco.dev

By examining the limits of reasonable user behavior and embracing imperfection for users who go beyond it, we can continue to provide service that meets the expectations of users without sacrificing…

Often when designing systems, we aim for perfection in things like consistency of data, availability, latency, and more.

The hardest part of system design is that it’s difficult (if not impossible) to design systems that have perfect consistency, perfect availability, incredibly low latency, and incredibly high throughput, all at the same time.

Instead, when we approach system design, it’s best to treat each of these properties as points on different axes that we balance to find the “right fit” for the application we’re supporting.

I recently made some major tradeoffs in the design of Bluesky’s Following Feed/Timeline to improve the performance of writes at the cost of consistency in a way that doesn’t negatively affect users but reduced P99s by over 96%.

Timeline Fanout

When you make a post on Bluesky, your post is indexed by our systems and persisted to a database where we can fetch it to hydrate and serve in API responses.

Additionally, a reference to your post is “fanned out” to your followers so they can see it in their Timelines.

Fanout Job After P99 Latency Graph

Knowing where it’s okay to be imperfect lets you trade consistency for other desirable aspects of your systems and scale ever higher.

There are plenty of other places for improvement in our Timelines architecture, but this step was a big one towards improving throughput and scalability of Bluesky’s Timelines.

If you’re interested in these sorts of problems and would like to help us build the core data services that power Bluesky, check out this job listing.

If you’re interested in other open positions at Bluesky, you can find them here.


Read the original article

Comments

  • By pornel 2025-02-1922:255 reply

    I wonder why timelines aren't implemented as a hybrid gather-scatter choosing strategy depending on account popularity (a combination of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served).

    When you have a celebrity account, instead of fanning out every message to millions of followers' timelines, it would be cheaper to do nothing when the celebrity posts, and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline. When millions of followers do that, it will be cheap read-only fetch from a hot cache.

    • By ericvolp12 2025-02-1922:373 reply

      This is probably what we'll end up with in the long-run. Things have been fast enough without it (aside from this issue) but there's a lot of low-hanging fruit for Timelines architecture updates. We're spread pretty thin from a engineering-hours standpoint atm so there's a lot of intense prioritization going on.

      • By Xunjin 2025-02-2013:251 reply

        Just to be clear, you are a Bluesky engineer, right?

        off-topic: how has been dealing with the influx of new users after X political/legals problems aftermath? Did you see an increase in toxicity around the network? And how has you (Bluesky moderation) dealing with it.

        • By ToucanLoucan 2025-02-2013:513 reply

          [flagged]

          • By breakyerself 2025-02-2015:434 reply

            There's nothing wrong with being partisan if you're partisan against fascists who want to destroy democracy and the rule of law.

            • By tabony 2025-02-2018:102 reply

              I understand why some people vote for some parties and why they’re “voting on inflation” or “right to abortion” but I guess, for me, keeping checks and balances and democracy is the one value above ALL for me.

              In the span of human history, not a lot of countries and civilizations have lasted long, marked by constant instability and uncertainty for the future. We have a boring and imperfect political system created by our founding fathers but at least it’s been stable for nearly 250 years. A lot of people have tried standing up their own political system… most fail and everyone suffers. Even the founding fathers completely failed once first.

              I know times are tough now but, in the context of history, they can be much worse and I rather not lose what good we currently do have.

              • By dragonwriter 2025-02-2018:161 reply

                > we got 250 years so far without imploding

                We may have arguably recovered from it, but we rather famously did not get 250 years without the union violently fragmenting. (Our best record on that is right around 160, currently.)

                • By tabony 2025-02-2021:29

                  While it’s true we came close during Civil War, we still decided to keep the same system of government. In the end, while the Civil War did result in some constitutional crises, the root of the problem was more that one half of the country completely disagreed with the other half… I don’t think any political system can really work with that level of division and yet we kept the same one. Obviously the Civil War did very much bring into the question of states’ rights but, for better or worse, the founders were a little vague on that so we can still keep most of the same system and quabble over the details for the rest of eternity…

              • By meowface 2025-02-2019:061 reply

                Trump refusing to accept the 2020 election results should've been the line for many voters, but sadly it wasn't. And the potential crimes he and some of his allies may have committed while trying to overturn it will now never be prosecuted.

                • By jasonvorhe 2025-02-2021:192 reply

                  2024: > More than 155 million people cast ballots in the 2024 presidential election. It's second only in U.S. history to the 2020 election. Turnout in 2024 represented 63.9% of eligible voters, the second-highest percentage in the last 100 years, according to the University of Florida Election Lab. The only year that beat it – again – was 2020 when universal mail-in voting was more widely available.

                  2020: > More than 158 million votes were cast in the election

                  So 3 millions of Democrats suddenly decided to not go out to vote "to save democracy" against "fascism"?

                  • By cton 2025-02-2022:46

                    > The only year that beat it – again – was 2020 when universal mail-in voting was more widely available.

                    You answered your own question. Voting was made more difficult in 2024, so fewer votes were cast.

                  • By weakfish 2025-02-2021:441 reply

                    The simpler and much more likely answer, my friend, is that people didn’t vote from a combination of disillusionment, assuming Kamala would win, and likewise factors.

                    I saw many people close to me not bother voting because they didn’t enjoy Biden’s presidency, despite voting for him in 2020.

                    So, I find that FAR more likely as a reason than supposed election fraud.

                    • By jasonvorhe 2025-02-2118:581 reply

                      I'm really confused how tech people shifted from "voting machines are inherently insecure" to simply ignoring the issue despite many political connections between Democrats and voting machine vendors. I'll stick with the results of my research into the matter. If you think you're well enough informed and that your sources actually care about the truth, let's agree to disagree.

                      • By meowface 2025-02-2122:31

                        This is one of the most investigated issues in American legal history. There was absolutely no indication of fraud. You've fallen for a conspiracy theory. It's now Pizzagate-tier.

                        (I still argue with Pizzagate adherents on a monthly basis. They think it's perfectly logical.)

            • By ToucanLoucan 2025-02-2016:58

              Oh fully agreed. But there's a large contingent of folks that are well represented here who think that it's inherently more intelligent to act like/be a centrist, that "both sides have something to offer," which isn't strictly untrue, but in practice especially with American politics just results in mealy-mouthed acceptance of pretty brutal status quos.

              Like even left and right in terms of the mainstream here is nonsense. We don't have a left party at all, we have a conservative party, and we have an authoritarian fascist party. As a lefty none of my values are represented at all, I just get to vote each election for the conservative party that doesn't want my friends dead.

            • By zamalek 2025-02-2016:401 reply

              Yup. This is a well-tread philosophical problem: the Paradox of Tolerance. Greater minds have concluded "to protect tolerance, one has to be intolerant of intolerance."

              And, as always, bsky is a place of business - it is not a public venue. They can decide not to admit individuals who would threaten their business.

              • By devmor 2025-02-2017:182 reply

                I have heard it much more aptly described as “enforcing the social contract”.

                You agree to uphold the contract of tolerance with everyone that participates. If someone refuses to uphold the contract with others who do, then you have no obligation to uphold the contract with that individual.

                • By moate 2025-02-2021:06

                  Exactly. Tolerance is an opt-in protection. If you don't opt-in by exercising it yourself, you don't get the benefits.

                  Or, as a meme: YA_GOTTA_GIVE!.gif

                • By zamalek 2025-02-2020:04

                  I like that, it's less paradoxical, and likely easier to explain to people with less developed critical thinking skills.

            • By Imustaskforhelp 2025-02-2016:361 reply

              Funny how you call trump administration fascist. (theoretically its anti fascist but its still bad ,

              Taking from the description of the video since this was what immediately ringed when you said trump===fascism

              The liberal theory of the rise of Trumpism and its supposed fascistic features is inadequate in both effectively analysing and offering solutions to the present situation. Liberals often personalise or individualise people like Donald Trump and Elon Musk, casting them as deviations, as opposed to manifestations of class society. Class analysis suggests that fascism was a unique response to growing anti-capitalist organisations, socialist and/or anarchist, gaining prominence and posing threats to the economic base. The owning class required a mass movement which enveloped otherwise disillusioned people into a political project which had the collectivist, anti-free market appeal that socialist and anarchist organisations had, but nonetheless committed to solidifying and strengthening the economic base and profit motive. In modern America, no such anti-capitalist threat exists. Neoliberalism has created significant disillusionment with mainstream social and political institutions and systems, but this disillusionment hasn’t been captured by anti-capitalist forces, but rather by the populist right. As such, the populist right doesn’t need to give up the economic game, i.e. free markets, deregulation, privatisation, austerity, etc (with the exception of tariffs), but can purely rely on minorities as scapegoats in a constructed culture war, such as immigrants, ‘wokeness’, transgender people, etc. Therefore, capital doesn’t need to be subordinated to the nation-state, like pursued by contemporary fascist governments. Rather, in this ‘inverted’ fascism, capital takes over and exploits the state in a rather oligarchic manner.

              https://www.youtube.com/watch?v=pqdLwkyfLdM

              This video is really great , I spent 10 minutes looking for this.

              I am not a trump supporter , The title might be a little clickbaity (basically the opposite of what it really is) You might find it really great.

              It is one of the best videos I have ever watched on politics.

              • By meowface 2025-02-2019:17

                I find communist analysis tiresome, especially when in this case the populist right under Trump seems to be motivated in part by anti-free market ideas. The communist kneejerk reaction to every single situation is "this can be explained by class analysis". It's them trying to shoehorn their pet theory into everything.

          • By Sloowms 2025-02-2015:05

            You're not less partisan if you prefer a slimmer range of political leanings.

      • By petra 2025-02-2019:41

      • By curious_cat_163 2025-02-1923:24

        That's insightful. Keep up the good work!

    • By VWWHFSfQ 2025-02-201:594 reply

      At some point they'll end up just doing the Bieber rack [1]. It's when a shard becomes so hot that it just has to be its own thing entirely.

      [1] - https://www.themarysue.com/twitter-justin-bieber-servers/

      @bluesky devs, don't feel ashamed for doing this. It's exactly how to scale these kinds of extreme cases.

      • By genewitch 2025-02-2012:101 reply

        I've stood up machines for this before I did not know they had a name, and I worked at the mouse company and my parking spot was two over from a J. Beibe'rs spot.

        So now we have Slashdot effect, HN hug, and its not Clarkson its... Stephen Fry effect? Maybe can be Cross-Discipline - there's a term for when lots of UK turns their kettles on at the same time.

        I should make a blog post to record all the ones I can remember.

      • By bitbckt 2025-02-2012:28

        We never actually had a literal “Bieber Box”, but the joke took off.

        Hot shards were definitely an issue, though.

      • By stavros 2025-02-2010:05

        Given that BlueSky is funded by Twitter, I'm assuming they know a lot more than us on how Twitter architects systems.

      • By Imustaskforhelp 2025-02-2016:38

        Its so crazy.

        Thanks a lot for sharing this link.

    • By rubslopes 2025-02-1923:151 reply

      This problem is discussed in the beginning of the Designing Data-Intensive Applications book. It's worth a read!

      • By Brystephor 2025-02-206:011 reply

        Do you know the name of the problem or strategy used for solving the problem? I'd be interested in looking it up!

        I own DDIA but after a few chapters of how database work behind the scenes, I begin to fall asleep. I have trouble understanding how to apply the knowledge to my work but this seems like a useful thing with a more clear application.

        • By bitbckt 2025-02-2012:26

          Yes, we used the Yahoo! “Feeding Frenzy” paper as the basis for the design of Haplocheirus (the timeline service).

    • By rsynnott 2025-02-2012:152 reply

      > and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline

      I think then you still have the 'weird user who follows hundreds of thousands of people' problem, just at read time instead of write time. It's unclear that this is _better_, though, yeah, caching might help. But if you follow every celeb on Bluesky (and I guarantee you this user exists) you'd be looking at fetching and merging _thousands_ of timelines (again, I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users).

      Given the nature of the service, making read predictably cheap and writes potentially expensive (which seems to be the way they've gone) seems like a defensible practice.

      • By fc417fc802 2025-02-2018:15

        > I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users

        Random sampling? It's not as though the user needs thousands of posts returned for a single fetch. Scrolling down and seeing some stuff that's not in chronological order seems like an acceptable tradeoff.

      • By christkv 2025-02-2012:26

        You might mix the approaches based on some cut off point

    • By locusofself 2025-02-1922:552 reply

      Why do they "insert" even non-celebrity posts into each follower's timeline? That is not intuitive to me.

      • By giovannibonetti 2025-02-1923:18

        To serve a user timeline in single-digit milliseconds, it is not practical for a data store to load each item in a different place. Even with an index, the index itself can be contiguous in disk, but the payload is scattered all over the place if you keep it in a single large table.

        Instead, you can drastically speed up performance if you are able to store data for each timeline somewhat contiguously on disk.

      • By wlonkly 2025-02-204:211 reply

        Think of it as pre-rendering. Of pre-rendering and JIT collecting, pre-rendering means more work but it's async, and it means the timeline is ready whenever a user requests it, to give a fast user experience.

        (Although I don't understand the "non-celebrity" part of your comment -- the timeline contains (pointers to) posts from whoever someone follows, and doesn't care who those people are.)

        • By locusofself 2025-02-211:331 reply

          Perhaps I misunderstanding, I thought the actual content of each tweet was being duplicated to every single timeline who followed the author, which sounded extremely wasteful, especially in the case of someone who has 200 million followers.

          • By TimK65 2025-02-218:48

            From the linked article: "Additionally, a reference to your post is 'fanned out' to your followers so they can see it in their Timelines."

            So not the content, just a sort of link to it.

  • By ChuckMcM 2025-02-1919:117 reply

    As a systems enthusiast I enjoy articles like this. It is really easy to get into the mindset of "this must be perfect".

    In the Blekko search engine back end we built an index that was 'eventually consistent' which allowed updates to the index to be propagated to the user facing index more quickly, at the expense that two users doing the exact same query would get slightly different results. If they kept doing those same queries they would eventually get the exact same results.

    Systems like this bring in a lot of control systems theory because they have the potential to oscillate if there is positive feedback (and in search engines that positive feedback comes from the ranker which is looking at which link you clicked and giving it a higher weight) and it is important that they not go crazy. Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

    Reading this description of how user's timelines are sharded and the same sorts of feedback loops (in this case 'likes' or 'reposts') sounds like a pretty interesting problem space to explore.

    • By snailmailman 2025-02-1920:154 reply

      I guess I hadn’t considered that search engines could be reranking pages on the fly as I click them. I’ve been seeing my DuckDuckGo results shuffle around for a while now thinking it’s an awful bug.

      Like I click one page, don’t find what I want, and go back thinking “no, I want that other result that was below” and it’s an entirely different page with shuffled results, missing the one that I think might have been good.

      • By PaulHoule 2025-02-1920:381 reply

        That's connected with a basic usability complaint about current web interfaces, that ads and recommended content aren't stable. You very well might want to engage with an ad after you are done engaging what you wanted to engage with but you might never see it again. Similarly, you might see two or three videos that you want to click on on the side of a YouTube video you're watching but you can only click on one (though if you are thinking ahead you can open these in another tab.)

        On top of that immediate frustration, the YouTube style interface here

        https://marvelpresentssalo.com/wp-content/uploads/2015/09/id...

        collects terrible data for recommendations because, even though it gives them information that you liked the thumbnail for a video, they can't come to any conclusion about whether or not you liked any of the other videos. TikTok, by focusing on one video at a time, collects much better information.

        • By 4ggr0 2025-02-208:36

          > though if you are thinking ahead you can open these in another tab

          or add it to the "Watch Later" playlist :) so you can watch it...later.

      • By cgriswald 2025-02-1920:451 reply

        I don't use DDG, but in my (very limited, just now) testing it doesn't seem to shuffle results unless you reload the page in some way. Is it possible you're browser is reloading the page when you go back? If so, setting DDG to open links in new tabs might fix this problem.

        • By snailmailman 2025-02-1923:27

          Interesting. Maybe something in my configuration is affecting it. I’ll have to look into it

      • By numeri 2025-02-2013:22

        This behavior started happening for me in the last few months. If I click on a result, then go back, I have different search results.

        I've found a workaround, though – click back into the DDG search box at the top of the page and hit enter. This then returns the original search results.

      • By gtfiorentino 2025-02-2015:19

        Hi - I work on search at DuckDuckGo. Do you mind sharing a bit more detail about this issue? What steps would allow us to reproduce what you're seeing?

    • By gopher_space 2025-02-2016:00

      > Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

      Looking back at my early work with microservices I'm wondering how much time I would have saved by just manually setting a tongue weight.

    • By dwedge 2025-02-1921:23

      Similar to how Google images loads lower quality blurred thumbnails towards the bottom of the window at first so that the user thinks they loaded faster

    • By aqueueaqueue 2025-02-202:49

      This is less a question of perfection and one of trade off's. Laws of physics put a limit on how efficiently you can keep data in NYC and London in perfect sync, so you choose CAP-style trade-offs. There are also $/SLO trade-offs. Each 9 costs more money.

      I like your example it is very interesting. If I get to work on (or even hear someone in my team is working on) such interesting problems and I can hear about it, I get happy.

      Interesting problems are rare because like a house you might talk about brick vs. Timber frame once, but you'll talk about cleaning the house every week!

    • By gregw134 2025-02-1920:13

      Would you be willing to share more about how you guys did click ranking at Blekko? It's an interesting problem.

    • By culi 2025-02-1919:281 reply

      What became of Blekko?

      • By an_ko 2025-02-1919:571 reply

        > It was acquired by IBM in March 2015, and the service was discontinued.

        https://en.wikipedia.org/wiki/Blekko

        Perhaps GP has a more interesting answer though.

        • By ChuckMcM 2025-02-1921:482 reply

          That's the correct answer, IBM wanted the crawler mostly to feed Watson. Building a full search engine (crawler, indexer, ranker, API, web application) for the English language was a hell of an accomplishment but by the time Blekko was acquired Google was paying out tens of billions of dollars to people to send them and only them their search queries. For a service that nominally has to live on advertising revenue getting humans to use it was the only way to be net profitable, and you can't spend billions buying traffic and hope to make it back on advertising as the #3 search engine in the English speaking markets.

          There are other ways to monetize search (look at Kagi for example) than advertising. Blekko missed that window though. (too early, Google needed to get a crappy as it is today to make the value of a spam free search engine desirable)

          • By NetOpWibby 2025-02-204:252 reply

            Blekko was gone by the time I learned about it. Recently (past few years) I emailed someone who worked on Blekko to get his opinion on a search engine concept I still have yet to start. His advice was to not bother competing with Google (obviously) LOL!

            I don’t know if anyone’s embarked on a P2P search engine but that’s essentially my concept. Anyhoo, thanks for the inspiration!

            • By ChuckMcM 2025-02-204:451 reply

              Peer to peer would be tough, you really need a 10G network connection to some tier 1 provider, and about 2500 machines to distribute the crawling/serving load. (that is if you want to do a full stack search engine). And while you can run that infrastructure for on the order of $100K/month (not counting depreciation) that means you need roughly $5K/day in revenue from that cluster. At $10 RPM ($10 revenue per thousand queries) you're looking at a minimum of 500,000 'real' search queries during 'English time' (roughly 7AM to 11PM GMT). That's 31,250 queries per hour or ~9 queries per second (average).

              And that just pays to keep the lights on at the colocation center. If you're paying off the development costs (30 - 50 developers over 2 - 3 years) and the cost of an office somewhere. You'll want at least double that revenue or you'll go broke before you break even.

              Ideally you are the 'go to' place for people looking to buy something as those queries make money. People researching Douglas Fairbanks for a high school essay consume queries but don't generate ad revenue.

              It isn't for the faint of heart.

              • By NetOpWibby 2025-02-206:36

                When you don't know what you don't know...wow.

                I know "search is hard" in the general sense but context is lacking (not a lot of details online from ex-search teams). It's always been apparent to me that you must have some other high-grossing product if you want to get into search or video, if only to pay for the servers.

                Thank you for providing your context!

            • By immibis 2025-02-2012:182 reply

              Darknet Lantern is a decentralized searchable directory. It's probably not going to take off, but it could inspire something else. Servers spider other servers with the same software, and synchronized their data.

              • By ChuckMcM 2025-02-2019:03

                Yup, directory services are a lot easier to do peer-to-peer. Pinboard.in is a good shared directory (sort of Yahoo! without the editorial). They can yield excellent quality when you're searching for something that someone has 'indexed' with them, but poor recall when it comes to the set of all possible answers.

                Doing it peer to peer without editorial allows sites to 'get into' the index easily which has its own plusses and minuses.

              • By NetOpWibby 2025-02-2014:56

                I’ve never heard of this before but it looks interesting. Thanks for the tip!

          • By chrisweekly 2025-02-1922:181 reply

            Not my Q but thanks for the interesting history.

            Also, (for other readers), I'm a huge fan of Kagi. Highly recommended.

            • By NetOpWibby 2025-02-206:49

              I really thought Neeva was gonna make it. I'm glad Kagi swooped in when they exited.

    • By genewitch 2025-02-2014:00

      PID techniques useful?

  • By rakoo 2025-02-1920:04

    Ok I'm curious: since this strategy sacrifices consistency, has anyone thoughts about something that is not full fan-out on reads or on writes ?

    Let's imagine something like this: instead of writing to every user's timeline, it is written once for each shard containing at least one follower. This caps the fan-out at write time to hundreds of shards. At read time, getting the content for a given users reads that hot slice and filters actual followers. It definitely has more load but

    - the read is still colocated inside the shard, so latency remains low

    - for mega-followers the page will not see older entries anyway

    There are of course other considerations, but I'm curious about what the load for something like that would look like (and I don't have the data nor infrastructure to test it)

HackerNews