PostgreSQL is enough

Instantly share code, notes, and snippets.

Comments

By superb-owl 2024-02-0613:5512 reply

I often go down rabbit holes like this, trying to collapse and simplify the application stack.

But inevitably, as an application grows in complexity, you start to realize _why_ there's a stack, rather than just a single technology to rule them all. Trying to cram everything into Postgres (or lambdas, or S3, or firebase, or whatever other tech you're trying to consolidate on) starts to get really uncomfortable.

That said, sometimes stretching your existing tech is better than adding another layer to the stack. E.g. using postgres as a message queue has worked very well for me, and is much easier to maintain than having a totally separate message queue.

I think the main takeaway here is that postgres is wildly extensible as databases go, which makes it a really fun technology to build on.

By deathanatos 2024-02-0614:543 reply

I have certain experience with some technologies, e.g., SQS and Postgres.

Say I'm on your team, and you're an application developer, and you need a queue. If you're taking the "we're small, this queue is small, just do it in PG for now and see if we ever grow out of that" — that's fine. "Let's use SQS, it's a well-established thing for this and we're already in AWS" — that's fine, I know SQS too. I've seen both of these decisions get made. (And both worked: the PG queue was never grown out of, and generally SQS was easy to work with & reliable.)

But what I've also seen is "Let's introduce bespoke tech that nobody on the team, including the person introducing it, has experience in, for a queue that isn't even the main focus of what we're building" — this I'm less fine with. There needs to be a solid reason why we're doing that, and that we're going to get some real benefit, vs. something that the team does have experience in, like SQS or PG. Instead, this … thing … crashes on the regular, uses its own bespoke terminology, and you find out the documentation is … very empty. This does not make for a happy SRE.

By cnity 2024-02-0616:463 reply

This desire can sometimes be so strong that people insist on truly wacky decisions. I have before demonstrated that Postgres performs perfectly well (and in fact exceeds) compared with a niche graph database, and heard some very strange reasons for why this approach should be avoided. A lot of the time you hear that it's engineers who chase shiny technology, but I've seen first hand what can happen when it's leadership.

By cbreezyyall 2024-02-0621:481 reply

Often referred to as resume driven development.

By smitty1e 2024-02-070:41

RDD leaves serious wreckage in its wake.

By tracker1 2024-02-0617:35

I've been on both sides of this..

Rabbit MQ and Elastic Search for a public facing site. The dedicated queue for workers to denormalize and push updates. To elastic. Why, because the $10k/month RDBMS servers couldn't handle the search load and were overly normalized. Definitely a hard sell.

I've also seen literally hundreds of lambda functions connecting to dozens of dynamo databases.

I'm firmly in the camp of use an RDBMS (PostgreSQL my first choice) for most things in most apps. A lot of times you can simply apply the lessons from other databases at scale in pg rather than something completely different.

I'm also more than okay leveraging a cloud's own MQ option, it's usually easy enough to swap out as/if needed.

By spothedog1 2024-02-0618:382 reply

Can you expand on Postgres vs Graph Databases?

By cnity 2024-02-079:36

It is easy to represent a graph in Postgres using edge and node tables. For the use case we have, it is more performant to query such a setup for many millions of relationships vs using the big names in graph databases.

You just need a little bit of appropriate index selection and ability to read the output of EXPLAIN ANALYZE to do so.

There are probably use cases where this doesn't hold, but I found in general that it is beneficial to stick to Postgres for this, especially if you want some ability to query using relations.

By mharig 2024-02-0711:50

IIRC, the biggest (AFAIK) graph DB in the world, TAO from Facebook, is based on an underlying MySQL. There must be a good reason why FB prefers a SQL DB over a dedicated graph DB.

By sfn42 2024-02-0711:55

I worked in a team that did this. It was mostly staffed by juniors, and the team leader wasn't very interested in the technical aspects, they just went to a page, checked that the new feature worked alright and gave the green light.

So over the years these juniors have repeatedly chosen different tech for their applications. Now the team maintains like 15-20 different apps and among them there's react, Vue, angular, svelte, jQuery, nextjs and more for frontends alone. Most use Episerver/Optimizely for backend but of course some genius wanted to try Sanity so now that's in the mix as well.

And it all reads like juniors built it. One app has an integration with a public api, they built a fairly large integration app with an integration db. This app is like 20k lines of code, much of which is dead code, and it gets data from the public api twice a day whereas the actual app using the data updates once a day and saves the result in its own Episerver db. So the entire thing results in more api traffic rather than less, the app itself could have just queried the api directly.

But they don't want me to do that, they just want me to fix the redundant integration thing when it breaks instead. Glad I'm not on that team any more.

By kabes 2024-02-0618:112 reply

Ok. I get that. But to play devil's advocate: with that mentality we'd never learn a new technology and still be stuck on punch cards. And I don't have the time anymore for hobby projects. I'd say it's ok to introduce something new as long as it's one thing at a time and not an entire new stack in the "a rewrite will solve all problems" projects

By beagle3 2024-02-0618:552 reply

To me this argument sounds like “I don’t have time for hobby projects, so I’m going to treat this professional one as a hobby”.

I always start a professional project with technologies I am intimately familiar with - have used myself, or have theoretical knowledge of and access to someone with real experience.

There has never been a new shiny library/technology that would have saved more than 10% of the project time, in retrospect. But there have been many who would have cost 100% more.

By jamwil 2024-02-0619:091 reply

I take your point but you don’t explain how you came to be intimately familiar with those technologies in the first place. Applied consistently, this logic would seem to preclude becoming familiar with anything.

By beagle3 2024-02-0622:01

For projects where I have a paying customer, this rule is absolute; I do not experiment on my client's time (and dime) unless they specifically request it.

But I do have projects which I finance myself (with myself as customer), and which do not have a real deadline. I can experiment on those. Call them "hobby" projects if you insist.

> Applied consistently, this logic would seem to preclude becoming familiar with anything.

Well, project requirements always rank higher, and many projects require some piece I am unfamiliar with (a new DB - e.g. MSSQL; a new programming language; etc). That means one does get familiar on a need basis , even applying this approach robotically.

If a project requires building the whole thing around a new shiny technology with few users and no successful examples I can intimately learn from ... I usually decline taking it.

By nyrikki 2024-02-0620:021 reply

This isn't a dichotomy.

That is the point of DDD,SoA,Clean, Hexagonal patterns.

Make a point to put structures and processes in place that encourage persistence ignorance in your business logic as the default and only violate that ideal where you have to.

That way if you outgrow SQL as a message bus you can change.

This mindset also works for adding functionality to legacy systems or breaking apart monoliths.

Choosing a default product to optimize for delivery is fine, claiming that one product fits all needs is not.

Psql does have limits when being used as a message or event bus, but it can be low risk if you prepare the system to change if/when you hit those limits.

Letting ACID concepts leak into the code is what tends to back organisations into a corner that is hard to get out of.

Obviously that isn't the Kool aid this site is selling. With this advice being particularly destructive unless you are intentionally building a monolith.

"Simplify: move code into database functions"

At least for any system that needs to grow.

By beagle3 2024-02-0622:03

I was not saying "psql is all you'll ever need". I was just replying to

>>> "Applied consistently, this logic would seem to preclude becoming familiar with anything."

As a general principle.

By deathanatos 2024-02-0619:15

I'm okay with new technology, actually, but the person introducing it has to be able to champion it & do the work of debugging issues and answering questions about its interactions with the rest of the system. I.e., they have to be responsible for it.

The last part in my parent comment is more of a "it was chucked over the fence, and it is now crashing, and nobody, not even the devs that chose it, know why".

I do have examples of what you describe, too: a dev I worked with introduced a geospatial DB to solve issues with geospatial queries being hard & slow in our then-database (RDS did not, at the time, support such queries) — so we went with the new thing. It used Redis's protocol, and was thus easy to get working with¹. But the dev that introduced it to the system was capable of explaining it, dealing with issues with it — to the extent of "upstream bugs that we encounter and produce workarounds", and otherwise being a lead for it. That new tech, managed in that way by a senior eng., was successful in what it sought to do.

The problematic parts/components/new introductions of new tech … never seem to have that. That's probably partly the problem: it's such an inherently non-technical issue at its heart. The exact thing almost doesn't matter.

> as long as it's one thing at a time

IME it's not. When there are problems, it's never just one new thing at a time.

> a rewrite will solve all problems

And the particular system I had in my mind while writing the parent post was, in fact, in the category of "a rewrite will solve all problems".

Some parts of the rewrite are doing alright, but especially compared to the prior system, there are just so. many. new. components. 2 new queue systems, new databases, etc. etc. So it's then hard to learn one, particularly without someone championing its success. It's another to self-learn and self-bootstrap on 6 or 8 new services.

¹(Tile38)

By diggan 2024-02-0614:121 reply

I think a lot of the industry struggles with the idea that maybe there is no "one size fits all", and what makes sense when you're a one person company with 100 customer probably doesn't make sense when you're a 1000 people company with millions of customers.

If you use a stack meant for a huge userbase (with all the tradeoffs that comes with it) but you're still trying to find market fit, you're in for a disappointment

Similarly, if you use a stack meant for smaller projects while having thousands of users relying on you, you're also in for a disappointment.

It's OK to make a choice in the beginning based on the current context and environment, and then change when it no longer makes sense. Doesn't even have to be "technical debt", just "the right choice at that moment".

By davidw 2024-02-0614:491 reply

> It's OK to make a choice in the beginning based on the current context and environment, and then change when it no longer makes sense.

Yep. And Postgres is a really good choice to start with. Plenty of people won't outgrow it. Those who do find it's not meeting some need will, by the time they need to replace it, have a really good understanding of what that replacement looks like in detail, rather than just some hand-wavy "web scale".

By tracker1 2024-02-0617:40

True enough and with modern hardware that barrier is relatively high. IIRC Stack overflow was handling several million users in a single database server over a decade ago... We've got over 8x the compute power and memory now.

Still need to understand the data model and effects on queues though.

By wvh 2024-02-0615:18

That's a nicely balanced view. I've been working on the intersection between dev, sec and ops for many, many years and one of the most important lessons has been that every dependency is a liability. That liability is either complexity, availability, security, wasting resources or projects or key people disappearing. Do anything to avoid adding more service, library or technology dependencies; if necessary, let people have their side projects and technological playgrounds to distil future stacks out of.

There are good reasons to go OLAP or graph for certain kinds of problems, but think carefully before adding more services and technologies because stuff has a tendency to go in easily but nothing ever leaves a project and you will inevitably end up with a bloated juggernaut that nobody can tame. And it's usually those people pushing the hardest for new technologies that are jumping into new projects when shit starts hitting the fan.

If a company survives long enough (or cough government), a substantial and ever increasing amount of time, money and sec/ops effort will go into those dependencies and complexity cruft.

By brightball 2024-02-0614:342 reply

PG works really well as a message queue and there's several excellent implementations on top of it.

Most systems are still going to need Redis involved just as a coordinator for other pub/sub related work unless you're using a stack that can handle it some other way (looking at BEAM here).

But there are always going to be scenarios as an application grows where you'll find a need to scale specific pieces. Otherwise though, PostgreSQL by itself can get you very, very far.

By stickfigure 2024-02-0614:591 reply

> PG works really well as a message queue

It's also worth noting that by using PG as a message queue, you can do something that's nearly impossible with other queues - transactionally enqueue tasks with your database operations. This can dramatically simplify failure logic.

On the other hand, it also means replacing your message queue with something more scalable is no longer a simple drop-in solution. But that's work you might never have to do.

By johnthescott 2024-02-082:58

yep.

By dfee 2024-02-0614:502 reply

Worth noting that Postgres has a pubsub implementation built in: listen/notify.

https://www.postgresql.org/docs/current/sql-notify.html

By cpursley 2024-02-0616:472 reply

Yep, I should add that. One of the libraries in my list (that I maintain) is WalEx: https://github.com/cpursley/walex/issues

It subscribes to the Postgres WAL and let you do the same sort of thing you can do with listen/notify, but without the drawbacks like need for triggers or character limits.

By maxbond 2024-02-0619:121 reply

What's the drawback to a trigger? I would think that any overhead you recouped by avoiding a trigger would be offset by the overhead of sending the entire WAL to your listener, rather than the minimized subset of events that listener is interested in.

(To be clear I do see other downsides to listen/notify and I think WalEx makes a lot of sense, I just don't understand this particular example.)

By cpursley 2024-02-0620:061 reply

You don’t send the entire WAL, just what you subscribe to - and you can even filter via SQL: https://github.com/cpursley/walex?tab=readme-ov-file#publica...

By cpursley 2024-02-0623:46

This post describes some of the other issues with listen/notify trigger approach: https://news.ycombinator.com/item?id=36323698

By brightball 2024-02-0617:471 reply

Going to add this to my research list.

By cpursley 2024-02-0618:061 reply

Ping me if you have any questions. Long time fan of your blog.

By brightball 2024-02-0618:29

That is really cool to hear, thank you.

And I should have mentioned it before, but we have an open call for speakers for the Carolina Code Conference. This would make for an interesting talk I think.

By brightball 2024-02-0616:25

Oh yea, definitely aware of it. I believe many of the queuing solutions utilize it as well.

I've ready a lot of reports (on here) that it comes with several unexpected footguns if you really lean on it though.

By agumonkey 2024-02-0617:461 reply

The more I do fullstack work the more I see an obesity crisis. I under the need to modularize (I dearly think I do) but god you have relational model, reimplemented in your framework, reencoded as a middleware to handle url parsing, the one more layer to help integrate things client side. I find that insane. And Postgrest was a refreshing idea.

By cpursley 2024-02-0621:391 reply

Seriously. There's like 7000 duplicates of the very same data layer in a single stack: database, back-end ORM/data mapper, front end and various caching things in between. Things like PostgREST and Hasura area great pared with fluent clients.

By agumonkey 2024-02-072:41

And then there's the failed microservice case.. what some people describe a distributed monolith where data has to be passed around through every layer, with domain logic replicated here and there.

By nextaccountic 2024-02-0614:152 reply

> But inevitably, as an application grows in complexity,

Some applications never grow that much

By LaGrange 2024-02-0614:261 reply

But it surely will! It will!

See, now that we're profitable, we're gonna become a _scale up_, go international, hire 20 developers, turn everything into microservices, rewrite the UI our customers love, _not_ hire more customer service, get more investors, get pressured by investors, hire extra c-levels, lay-off 25 developers and the remaining customer service, write a wonderful journey post.

The future is so bright!

By zwnow 2024-02-0614:551 reply

Honestly, that's one of the reasons I never want to monetize my work and stay miles away from the software industry. Modern world is all web apps that require you to subscribe to 20 different 3rd party services to even build your app. So you rack up bills before your product is even remotely lucrative...

Building an app with no third party dependencies seems impossible nowadays. At least if you plan to compete.

By LaGrange 2024-02-0615:191 reply

I mean, you _can_ host your staging environment on a Minisforum PC hidden in your closet and then deploy to Hetzner, and probably save a _ton_ unless your service benefits from things like, say, autoscaling or global low-latency access.

Niches where you can get away with that are limited, not just by technical challenges but because large parts of the social ecosystem of IT won't like that. But they do exist. There's also still things that aren't webapps _at all_, there's software that has to run without internet access. It's all far apart and often requires specialized knowledge, but it exists.

By zwnow 2024-02-075:46

Yea I mostly learned web dev so far but wanted to get into IoT stuff so I might find something cool to do in there.

By ejb999 2024-02-0614:18

I would go further and even say 'most' applications never grow that much.

By legohead 2024-02-0614:532 reply

My saying has always been: be nice to the DB

Don't use it anymore than you have to for your application. Other than network IO it's the slowest part of your stack.

By cpursley 2024-02-0616:491 reply

Handling business logic in the database is often going to be an order of magnitude faster than the application layer of some of the popular language stacks (looking at you, Rails, Node, etc). It also will outlive whatever webstack of the day (and acquisition which of en requires a re-write of the application layer but keeps general database structure - been there done that).

By jgalentine007 2024-02-0617:091 reply

Maybe faster... but I've met very few developers that are good DBAs (that understand procedures, cursors, permissions etc.) Database schema versioning / consistency is a whole other level of pain too.

By dventimi 2024-02-076:06

That sounds like a social problem, not a technical problem.

By niels_bom 2024-02-0615:483 reply

Would you say it's slower than file IO too?

By Too 2024-02-0617:081 reply

It’s not slow by itself. It’s a single point of bottleneck that will inevitably become slow as you cram everything into it.

By dagss 2024-02-0619:261 reply

...but by trying to avoid the bottleneck and moving things to backend, you make things 10x worse resource wise for the DB. So it is not a easy tradeoff.

Take any computation you can do in SQL like "select sum(..) ...". Should you do that in the database, or move each item over the network and sum them in the backend?

Summing in the database uses a lot less resources FOR THE DB than the additional load the DB would get from "offloading" this to backend.

More complex operations would typically also use 10x-100x less resources if you operate on sets and amortize the B-tree lookups over 1000 items.

The answer is "it depends" and "understand what you are doing"; nothing about it is "inevitable".

Trying to avoid computing in the DB is a nice way of thinking you maxed out the DB ...on 10% of what it should be capable of.

By Too 2024-02-0619:44

Yes. Aggregations and search are often best done as close to the data as possible, in the DB.

Rendering html, caching, parsing api responses, sending emails, background jobs: Nope.

Basically, use the database for what it’s good at, no more.

By legohead 2024-02-0618:31

Well, it is file IO, plus processing on top. But it's not that simple, since if your data is small enough it can all be loaded into memory, allowing you to sidestep any file IO. But you still have the processing part...

By alternatex 2024-02-0616:141 reply

Kind of irrelevant since a DB provides some guarantees that a simple file does not by default.

By samtheprogram 2024-02-0617:18

GP was responding to a comment comparing it to network IO in terms of bottlenecks in your application stack ...?

By brlewis 2024-02-0615:02

> you start to realize _why_ there's a stack, rather than just a single technology to rule them all

Architecturally, there are other cases besides message queues where there's no reason for introducing another layer in the stack, once you have a database, other than just because SQL isn't anybody's favorite programming language. And that's the real reason there's a stack.

By wg0 2024-02-0614:133 reply

I think SQS is cheap enough to build on as a messaging queue even if you're not hosting within AWS.

Out of the widely underrated AWS services include SNS and SES and they are not a bad choice even if you're not using AWS for compute and storage.

By williamdclt 2024-02-0614:331 reply

The problem is rarely cost, it's operational overhead.

Using SQS for a queue rather than my already-existing Postgres means that I have to:

- Write a whole bunch of IaC, figuring out the correct access policies - Set up monitoring: figure out how to monitor, write some more IaC - Worry about access control: I just increased the attack surface of my application - Wire it up in my application so that I can connect to SQS - Understand how SQS works, how to use its API

It's often worth it, but adding an additional moving piece into your infra is always a lot of added cognitive load.

By fuy 2024-02-0616:34

+. And then you have to figure everything one more time when you decide to move to (or to add support for) Azure/GCP.

By chuckhend 2024-02-0614:191 reply

You get exactly once when you consume with pgmq and run your queue operations inside transactions in your postgres database. I can't think of an easy way to get some equivalent on SQS without building something like an outbox.

By rcaught 2024-02-0614:261 reply

SQS FIFO has exactly-once processing

By vp8989 2024-02-0711:32

That's not what the parent post was referring to. If SQS (or your ability to talk to it) is down and your database isn't, what do you do?

By qaq 2024-02-0614:152 reply

SQS is at least once PG can give you exactly once

By rcaught 2024-02-0614:271 reply

SQS FIFO has exactly-once processing

By qaq 2024-02-0614:441 reply

well that's a stretch it has "5 minute window" You can hold a lock on a row in PG queue for as long as you need

By chuckhend 2024-02-0615:10

pgmq (which is linked on this gist) provides an api to this functionality. It can be 0 seconds, or 10 years if you want. It's not a row lock in, which can be expensive. In pgmq, its build into the design of the visibility timeout. FOR UPDATE SKIP LOCKED is there to ensure that only a single consumer gets any message, and then the visibility timeout lets consumer determine how long it should continue to remain unavailable to other consumers.

By qaq 2024-02-0614:231 reply

Not sure why this is making people upset.

By williamdclt 2024-02-0614:351 reply

Because it's incorrect. If you have any non-postgres side-effect, you can't have exactly-once (unless you do 2PC or something like that). There isn't any technology that gives you "exactly once" in the general case.

By qaq 2024-02-0614:401 reply

That's not how exactly once is defined for queue. We are talking about semantics of what queue systems is providing.

By silon42 2024-02-0614:481 reply

Nobody will understand it like that.

By qaq 2024-02-0614:58

Anyone who has ever selected queue service/product will understand it like that. Because thats one of the most prominent features that gets highlighted by those products:

SQS Standard queues support at-least-once message delivery.

NATS offers "at-most-once" delivery

etc.

By tetha 2024-02-0617:32

This is very much the way I'm pushing in our internal development platform: I want to offer as little middlewares as possible, but as many as necessary. And ideally these systems are boring, established tech covering a lot of use cases.

From there, Postgres ended up being our relational storage for the platform. It is a wonderful combination of supporting teams by being somewhat strict (in a flexible way) as well as supporting a large variety of use cases. And after some grumbling (because some teams had to migrate off of SQL Server, or off of MariaDB, and data migrations were a bit spicy), agreement is growing that it's a good decision to commit on a DB like this.

We as the DB-Operators are accumulating a lot of experience running this lady and supporting the more demanding teams. And a lot of other teams can benefit from this, because many of the smaller applications either don't cause enough load on the Postgres Clusters to be even noticeable or we and the trailblazer teams have seen many of their problems already and can offer internally proven and understood solutions.

And like this, we offer a relational storage, file storage, object storage and queues and that seems to be enough for a lot of applications. We're only now adding in Opensearch after a few years as a service now for search, vector storage and similar use cases.

By TheCapeGreek 2024-02-0614:181 reply

On top of that, a lot of discourse seems to happen with an assumption that you only make the tech/stack choice once.

For the majority of apps, just doing basic CRUD with a handful of data types, is it that hard to just move to another DB? Especially if you're in framework land with an ORM that abstracts some of the differences, since your app code will largely stay the same.

By philihp 2024-02-0810:02

The way you've trivialized a database vendor swap makes me curious how often you do this.

By twosdai 2024-02-0613:592 reply

The same argument of UNIX design patterns (Single responsibility, well defined interfaces and communication protocals) vs Monolithic design patterns comes up a lot. I think that its mainly because both are effective at producing products, its just that they both have downsides.

By macksd 2024-02-0614:09

I read a meme yesterday about how you can just interject "it's all about finding that balance" into any meeting and people will just agree with you. I'm gonna say it here.

Sometimes a flexible tool fits the bill well. Sometimes a specialized tool does. It's all about finding that balance.

Thank you for coming to my TED talk.

By e12e 2024-02-0614:11

Just noting that sometimes one can do both: seperate postgres DBs/clusters for different use-case, seperate instances of a web server for TLS termination, caching, routing/rewriting, Ed:static asset serving. Benefit is orderly architecture, and fewer different dependencies.

By philippemnoel 2024-02-0615:013 reply

I'm one of the makers of ParadeDB, a modern alternative to Elasticsearch. We build Postgres extensions to do fast search (pg_bm25) and analytics (pg_analytics). I love Postgres. If you have a small workload, like a startup, it certainly makes sense to stay within Postgres as long as you can.

The problem is, at scale, Postgres isn't the answer to everything. Each of the workloads one can put in Postgres start to grow into very specific requirements, you need to isolate systems to get independent scaling and resilience, etc. At this point, you need a stack of specialized solutions for each requirement, and that's where Postgres starts to no longer be enough.

There is a movement to build a Postgres version of most components on the stack (we are a part of it), and that might be a world where you can use Postgres at scale for everything. But really, each solution becomes quite a bit more than Postgres, and I doubt there will be a Postgres-based solution for every component of the stack.

By chasd00 2024-02-0615:253 reply

what is "at scale"? Is there a specific metric or range of metrics that raises a flag to begin considering something else? For example, in the olden days when it was my problem, page load times were the metric. Once it got high enough you looked for the bottleneck, solved it, and waited. When the threshold was broken again you re-ran the same process.

Is there an equivalent for postgres?

By ndriscoll 2024-02-0616:141 reply

This bugs me every time performance comes up. No one is ever concrete, so they can never be wrong.

If Michael Jackson rose from the dead to host the Olympics opening ceremony and there were 2B tweets/second about it, then postgres on a single server isn't going to scale.

A crud app with 5-digit requests/second? It can do that. I'm sure it can do a lot more, but I've only ever played with performance tuning on weak hardware.

Visa is apparently capable of a 5-digit transaction throughput ("more than 65,000")[0] for a sense of what kind of system reaches even that scale. Their average throughput is more like 9k transctions/second[1].

[0] https://usa.visa.com/solutions/crypto/deep-dive-on-solana.ht...

[1] PDF. 276.3B/year ~ 8.8k/s: https://usa.visa.com/dam/VCOM/global/about-visa/documents/ab...

By asah 2024-02-0619:391 reply

minor nit: 9K TPS for Visa are business transactions - TBD how many database transactions are generated...

(still, modern postgresql can easily scale to 10,000s (plural) of TPS on a single big server, especially if you setup read replicas for reporting)

By ndriscoll 2024-02-0621:211 reply

Yeah, I don't mean to say Visa can run global payment processing on a single postgres install; I'm sure they do a ton of stuff with each transaction (e.g. for fraud detection). But for system design, it gives an order of magnitude for how many human actions a global system might need to deal with, which you can use to estimate how much a wildly successful system might need to handle based on what processing is needed for each human action.

For similar scale comparisons, reddit gets ~200 comments/second peak. Wikimedia gets ~20 edits/second and 1-200k pageviews/second (their grafana is public, but I won't link it since it's probably rude to drive traffic to it).

By asah 2024-02-0713:44

yyy we're in violent agreement!

interesting re reddit, that's really tiny! but again, I'm even more curious about how many underlying TPS this turns into, net of rules firing, notifications and of course bots that read and analyze this comment, etc. Still, this isn't a scaling issue because all of this stuff can be done async on read replicas, which means approx unlimited scale in a single-database-under-management (e.g. here's this particular comment ID, wait for it)

Slack experiences 300K write QPS: https://slack.engineering/scaling-datastores-at-slack-with-v...

By gen220 2024-02-0619:01

The truth is that it really depends on your application work load. Is it read-heavy, or write-heavy? Are the reads more lookup-heavy (i.e. give me this one user's content), or OLAP heavy (i.e. `group by`'s aggregating millions of rows)? Is read-after-write an important problem in your application? Do you need to support real-time/"online" updates? Does your OLAP data need to be mutable, or can it be immutable (and therefore compressed, columnarized, etc.)? Is your schema static or dynamic, to what degree?

I agree with others that a good simplification is "how far can you get with the biggest single AWS instance"? And the answer is really far, for many common values of the above variables.

That being said, if your work load is more OLAP than OLTP, and especially if your workload needs to be real-time, Postgres will begin to give you suboptimal performance without maxing-out i/o and memory usage. Hence, "it really depends on your workload", and hence why you see it's common to "pair" Postgres with technologies like Clickhouse (OLAP, immutable, real-time), RabbitMQ/Kafka/Redis (real-time, write-heavy, persistence secondary to throughput).

By jimbokun 2024-02-0617:17

For me with any kind of data persistence backend, it's when you go from scaling vertically to horizontally. In other words, when it's no longer feasible to scale by just buying a bigger box.

I don't know that there is a canonical solution for scaling Postgres data for a single database across an arbitrary number of servers.

I know there is CockroachDB which scales almost limitlessly, and supports Postgres client protocol, so you can call it from any language that has a Postgres client library.

By jimbokun 2024-02-0617:144 reply

For scaling, has anyone here used hash based partitioning to scale horizontally?

In principle, seems like it should work to allow large scale distribution across many servers. But the actual management of replicas and deciding which servers to place partitions, redistributing when new servers are added, etc. could lead to a massive amount of operational overhead.

By gen220 2024-02-0618:50

As other sibling comments noted, Citus does this pretty well. Recommend reading through their docs and use-cases. There's some migration/setup costs, but once you have a good configuration, it mostly just-works.

Main downside is that you either have to either self-manage the deployment in AWS EC2 or use Azure's AWS-RDS-equivalent (CitusData was acquired by MS years ago).

FWIW, I've heard that people using Azure's solution are pretty satisfied with it, but if you're 100% on AWS going outside that fold at all might be a con for you.

By philippemnoel 2024-02-0617:59

Citus is indeed an example for "distributed PostgreS". There are also serverless Postgres (Neon, Nile, AWS Aurora) which do this.

If you are interested in partitioning in an OLAP scenario, this will soon be coming to pg_analytics, and some other Postgres OLAP providers like Timescale offer it already

By findjashua 2024-02-0618:05

hash based partitioning makes repartitioning very expensive. most distributed DB now use key-range based partitioning. Iirc, Dynamo which introduced this concept has also made the switch

By hot_gril 2024-02-0617:52

I think that's what Citus does.

By cpursley 2024-02-0616:531 reply

This looks great, I'll add it to my list.

I've gone far out of my way not to use Elasticsearch and push Postgres as far as as I can in my SaaS because I don't want the operational overhead.

By philippemnoel 2024-02-0618:00

This is exactly why we built ParadeDB :)

By emseetech 2024-02-0614:133 reply

This makes a strong case, but I've decided to start every new project with sqlite and not switch until absolutely necessary. If Postgres is the 90% case, then sqlite is the 80% case and is also dead simple to get going and genuinely performant. So when vertical scaling finally fails me, I know I'll be at a wonderful place with what I'm building.

By Ensorceled 2024-02-0614:238 reply

> [...] sqlite is the 80% case and is also dead simple to get going and genuinely performant.

I don't understand this. PostgreSQL is ALSO dead simple to get going, either locally or in production. Why not just start off at 90%?

I mean, I get there are a lot of use cases where sqlite is the better choice (and I've used sqlite multiple times over the years, including in my most recent gig), but why in general?

By tvink 2024-02-0614:341 reply

I think "dead simple" is not doing anyone any favors when it is being used to try to equate the simplicity of things.

It's obviously a lot simpler to just have a file, than to have a server that needs to be connected to, as long as we're still talking about running things on regular computers.

By Ensorceled 2024-02-0618:062 reply

I guess that's really my point here.

They difference in setup time is negligible so I'm not sure why people keep bringing it up as a reason to choose sqlite over PostgreSQL.

For instance, "deployable inside a customer application" is an actual requirement that would make me loath to pick PostgreSQL.

"Needs to be accessible, with redundancy, across multiple AWS zones" would make me very reluctant to pick sqlite.

Neither of these decisions involve how easy it is to set up.

It's like choosing between a sportbike and dump truck and focusing on how easy the sportbike is to haul around in the back of a pickup truck.

By SJC_Hacker 2024-02-0618:381 reply

I'm not sure its neglibile, I suppose once you know what you're doing.

But postgres setup, at least the package managers on Linux, will by default, create a user called postgres, and lock out anyone else who isn't this user from doing anything. Yeah you can sudo to get psql etc. easily, but that doesn't help your programs which are running as different users. You have to edit a config file to get to work, and I never figured out how to get to work with domain sockets and not TCP

By saltcured 2024-02-0619:42

That's interesting... my experience (almost all on RHEL/CentOS/Fedora) is that it is trivial to have unix domain socket with local Postgres clients and a pain to setup any remote clients.

You just have to call the basic "createuser" CLI (or equivalent CREATE ROLE SQL) out of the postgres superuser account to create database users that match local Linux usernames. Then the ident-based authentication matches the client process username to the database role of the same name.

By emseetech 2024-02-0618:342 reply

> Needs to be accessible, with redundancy, across multiple AWS zones

How many projects start with these requirements?

By Ensorceled 2024-02-0619:101 reply

Anything with real customer data in the cloud? Certainly you need replication.

By emseetech 2024-02-0619:331 reply

Litestream can handle realtime replication.

But most projects don’t even have customers when they start, let alone large quantities of their data and legal requirements for guaranteed availability.

By int_19h 2024-02-072:461 reply

I think it's reasonable for a business, even a freshly starting one, to expect to grow to the point where it does have enough customers to outgrow SQLite fairly soon. Between that and PG offering more features in general (which often simplifies app code, because you can do more with queries), it's still not clear why not start with PG in the first place.

PG, on the other hand, can scale enough to cover foreseeable future needs for most businesses, so aiming for something more complex than that is almost always premature.

By emseetech 2024-02-072:56

> outgrow SQLite fairly soon

That would be the result of either vastly overestimating their business plan or vastly underestimating SQLite.

By rrr_oh_man 2024-02-076:301 reply

> How many projects start with these requirements?

In a world fueled by cheap money and expensive dreams, you'd be surprised.

By emseetech 2024-02-082:46

I’m not counting on a cheap money future. Not near future at least.

By klibertp 2024-02-0615:011 reply

> PostgreSQL is ALSO dead simple to get going

I'm not saying it's hard to set up Postgres locally, but sqlite is a single binary with almost no dependencies and no config, easily buildable from source for every platform you can think of. You can grab a single file from sqlite.org, and you're all set. Setting up Postgres is much more complicated in comparison (while still pretty simple in absolute terms - but starting with a relatively simpler tool doesn't seem like a bad strategy.)

By scaryclam 2024-02-0617:141 reply

Except for when your data is in it. Migrating data on a running app is one of the worst things to deal with. I can understand using something simple and cut down for other things, but the DB is not the place I'd want to do that. Postgres isn't exactly hard to get going with, and will grow with you easily, so why trade that off for saving an hour or two at the start of the project?

By klibertp 2024-02-0717:44

> Except for when your data is in it.

Replacing the DB before it gets any actual data inserted into it solves this problem. You just switch to Postgres before you go anywhere beyond staging, at the latest - in practice, you need Postgres-exclusive functionality sooner than that in many cases, anyway. Even when that happens, you might still prefer having SQLite around as an in-memory DB in unit tests. The Postgres-specific methods are pretty rare, and you can mock them out, enjoying 100x faster setup and teardown in tests that don't need those methods (with big test suites, this quickly becomes important).

Unless you really want to use Postgres for everything like the Gist here suggests, the DB is just a normal component, and some degree of flexibility in which kind of component you use for specific purposes is convenient.

By benlivengood 2024-02-0614:51

Practically, because sqlite is good enough for one machine and compatible-enough with postgresql that you can use either pretty easily. One thing I wrote was an exactly-once stream processor that fetched events from a lot of remote systems for processing. Transaction-based queue in the DB to achieve exactly-once with recovery (remote systems accepted time-stamp resyncing of the stream of events). It works fine at small scale on a single machine for design and testing (local integration tests with short startup time are very valuable) but trivially scales to hundreds of workers if pointed at a postgres instance. The work to allow sqlite vs postgres was a single factory that returned a DB connection in Go based on runtime configuration.

It's also good practice for designing reasonably cross-database compatible schemas.

By runningamok 2024-02-0615:591 reply

One use case where SQLite is a good option is for embedding as a local database in an app. Starting local-only with SQLite allows you to defer a lot of the backend effort while testing an MVP.

By cpursley 2024-02-0616:59

You might find https://github.com/electric-sql/electric pretty cool.

By davidmurdoch 2024-02-0614:36

But PostgreSQL is not dead simple when compared to SQLite.

By mrbonner 2024-02-0618:34

If dead simple involves me babysitting a service process then not it is not. SQLite has embedded version that requires no service out of process. That's what my definition of dead simple.

By randomdata 2024-02-0615:142 reply

Postgres complicates the application in several ways. In particular, Postgres suffers from the n+1 problem, while SQLite does not. That requires a significant amount of added complexity in the application to hack around. Why over engineer the application before it has proven itself as something anyone even wants to use? Let's face it, the large majority of software written gets thrown away soon after it is created.

I already hear you saying that you know of a library that provides a perfect abstraction to hide all those details and complexities, making the choice between Postgres and SQLite just a flip of a switch away. Great! But then what does Postgres bring to the table for you to choose it over SQLite? If you truly prove a need for it in the future for whatever reason, all you need to do is update the configuration.

By sgarland 2024-02-0616:231 reply

This is a misunderstanding of the n+1 problem, which is exacerbated by SQLite's deceptive phrasing of the issue:

> In a client/server database, each SQL statement requires a message round-trip from the application to the database server and back to the application. Doing over 200 round-trip messages, sequentially, can be a serious performance drag.

While the above is true on its own, this is _not_ the typical definition of n+1. The n+1 problem is caused by poor schema design, badly-written queries, ORM, or a combination of these. If you have two tables with N rows, and your queries consist of "SELECT id FROM foo; SELECT * FROM bar WHERE id = foo.id_1...", that is not the fault of the DB, that is the fault of you (or perhaps your ORM) for not writing a JOIN.

By randomdata 2024-02-0616:405 reply

> that is the fault of you (or perhaps your ORM) for not writing a JOIN.

It's your fault for not writing a join if you need a join. But that's not where the n+1 problem comes into play.

Often in the real world you need tree-like structures, which are fundamentally not able to be represented by a table/relation. No amount of joining can produce anything other than a table/relation. The n+1 problem is introduced when you try to build those types of structures from tables/relations.

A join is part of one possible hack to workaround to the problem, but not the mathematically ideal solution. Given an idealized database, many queries is the proper solution to the problem. Of course, an idealized database doesn't exist, so we have to deal with the constraints of reality. This, in the case of Postgres, means moving database logic into the application. But that complicates the application significantly, having to take on the role that the database should be playing.

But as far as SQLite goes, for all practical purposes you can think of it as an ideal database as it pertains to this particular issue. This means you don't have to move that database logic into your application, simplifying things greatly.

Of course, SQLite certainly isn't ideal in every way. Tradeoffs, as always. But as far as picking the tradeoffs you are willing to accept for the typical "MVP", SQLite chooses some pretty good defaults.

By sgarland 2024-02-0622:201 reply

> Often in the real world you need tree-like structures, which are fundamentally not able to be represented by a table/relation. No amount of joining can produce anything other than a table/relation. The n+1 problem is introduced when you try to build those types of structures from tables/relations.

I don't know how precisely strict you expect a tree to be in RDBMS, but this [0] is as close as I can get. It has a hierarchy of product --> entity --> category --> item, with leafs along the way. In this example, I added two bands (Dream Theater [with their additional early name of Majesty], and Tool), along with their members (correctly assigning artists to the eras), and selected three albums: Tool's Undertow, with both CD and Vinyl releases, and Dream Theater's Train of Thought, and A Dramatic Turn of Events.

The included query in the gist returns all available information about the albums present in a single query. No n+1.

The inserts could likely be improved (for example, if you were doing these from an application, you could save IDs and then immediately reuse them; technically you could do that in pl/pgsql, but ugh), but they do work.

This is also set up to model books in much the same way, but I didn't add any.

> A join is part of one possible hack to workaround to the problem, but not the mathematically ideal solution.

Joins are not a "hack," they are an integral part of the relational model.

[0]: https://gist.github.com/stephanGarland/ec2d0f0bb54161898df66...

By randomdata 2024-02-0622:581 reply

> Joins are not a "hack," they are an integral part of the relational model.

Yes, joins are an essential part of the relational model, but we're clearly not talking about the relational model. The n+1 problem rears its ugly head when you don't have a relational model – when you have a tree-like model instead.

> The included query in the gist returns all available information about the albums present in a single query. No n+1.

No n+1, but then you're stuck with tables/relations, which are decidedly not in a tree-like shape.

You can move database logic into your application to turn tables into trees, but then you have a whole lot of extra complexity to contend with. Needlessly so in the typical case since you can just use SQLite instead... Unless you have a really strong case otherwise, it's best to leave database work for databases. After all, if you want your application to do the database work, what do you need SQLite or Postgres for?

Of course, as always, tradeoffs have to be made. Sometimes it is better to put database logic in your application to make gains elsewhere. But for the typical greenfield application that hasn't even proven that users want to use it yet, added complexity in the application layer is probably not a good trade. At least not in the typical case.

By sgarland 2024-02-0623:541 reply

n+1 can show up any time you have poorly modeled schema or queries. It’s quite possible to have a relational model that is sub-optimal; reference the fact that there are 5 levels of normalization (plus a couple extra) before you get into absurdity.

I still would like to know how SQLite does not suffer from the same problems as any other RDBMS. Do you have an example schema?

By randomdata 2024-02-074:571 reply

> I still would like to know how SQLite does not suffer from the same problems as any other RDBMS.

That's simple: Not being an RDMBS, only an engine, is how it avoids the suffering.

The n+1 problem is the result of slow execution. Of course, an idealize database has no time constraints, but the real world is not so kind. While SQLite has not figured out how to defy the laws of physics, it is able to reduce the time to run a query to imperceptible levels under typical usage by embedding itself in the application. Each query is just a function call, which are fast.

Postgres' engine can be just as fast, but because it hides the engine behind the system layer, you don't interact with the engine directly. That means you need to resort to hacks to try and poke at the engine where the system tries to stand in the way. The hacks work... but at the cost of more complexity in the application.

By dventimi 2024-02-076:481 reply

Compare and contrast the query execution stages of PostgreSQL and SQLite. How exactly do they work? Please be as precise as possible. Try to avoid imprecise terms like "simple", "suffering", "system layer", and "hack."

By randomdata 2024-02-076:592 reply

For what purpose? I can find no source of value in your request.

By sgarland 2024-02-0711:542 reply

To prove that you have an inkling about the subject you have wandered into.

Engineers and Scientists do that; they don’t hide behind airy and vague language – that is the realm of conmen.

You stated that you cannot implement a tree-like structure with tables, so I set about proving you wrong, and posted a gist that does so.

Until you can back up your claims with data, your words are meaningless, and no one here is going to take you seriously.

By randomdata 2024-02-0713:55

Prove to who? You?

Engineers and scientists are paid professionals. They are financially incentivized to help other people. Maybe you somehow managed to not notice, but I am but a person on an Internet forum. There is no incentive offered for me to do anything for anyone else. To have the gall to even ask for me to work for you without any compensation in kind is astounding.

Of what difference does it make if anyone takes me seriously or not? That's the most meaningless attribute imaginable. Oh noes, a random nobody on the internet doesn't believe me! It's the end of the world as we know it... How droll.

Perhaps it is that you do not understand what value means? Or how did you manage to write all those words and not come up with any suggestion of value whatsoever?

By dventimi 2024-02-077:031 reply

To educate your adoring fans

By randomdata 2024-02-077:16

But for what purpose? There is no value in educating adoring fans.

By mkleczek 2024-02-0619:17

You can easily get hierarchical output format from Postgres with its JSON or XML aggregate functions.

You can have almost all benefits of an embedded database by embedding your application in the database.

Just change perspective and stop treating Postgres (or any other advanced RDBMS) as a dumb data store — start using it as a computing platform instead.

By Ensorceled 2024-02-0618:101 reply

We are still dealing with the fact that SQLite still HAS the n+1 "problem", it's just fast enough that it doesn't suffer from it.

It's a very important distinction because, as you say, there are problem domains where you can't just "join the problem away".

By randomdata 2024-02-0618:141 reply

Are we? "Suffer" was the word used right from the beginning for good reason. Computers aren't magic. I find no importance in pointing out that fact. Presumably everyone here already knows that. And if it turns out that they don't, who cares? That's their problem. Makes no difference to me.

By Ensorceled 2024-02-0619:081 reply

I agree that suffer is the right word, but unclear. You are getting down voted because a lot of people are interpreting to mean you are saying applications using sqlite don't have n+1 queries.

By randomdata 2024-02-0619:261 reply

1. At time of writing, there has been one downvote in the first comment, followed by one upvote in the subsequent comment. Not only does that not translate to "a lot of people", it was quite likely just one person. And unless that person was you, it is impossible to know what their intent was. I'm not sure what are you trying to add here.

2. Who gives a shit? If the "computers truly are magic" camp don't understand what I wrote, great! It wasn't written for them in the first place. If that compels them to use their time pressing a button instead of learning about how computers work, great! Not my problem. I'm not sure what you are trying to add here.

By Ensorceled 2024-02-0622:531 reply

> Who gives a shit? [...] I'm not sure what you are trying to add here.

I guess nothing. You must be fun at design reviews.

By randomdata 2024-02-075:24

Well, let's hope the "computers are magic" riff-raff never show up at the design reviews. Especially if they expect someone to explain to them the basics of computing without any reasonable offer of compensation in return.

If those people show up here and put on a tantrum by pressing random buttons or whatever it was that you were trying to point out... Oh well? I wouldn't have even noticed if you didn't bring it up.

What value you found in calling attention to their pointless tantrums is an interesting mystery, though!

By int_19h 2024-02-072:50

If you need to query over a tree data structure, then that's what WITH RECURSIVE is for, and it's present in both DBMS.

If you additionally need the result of that query to be hierarchical itself, then you can easily have PG generate JSON for you.

By dventimi 2024-02-076:391 reply

> No amount of joining can produce anything other than a table/relation. The n+1 problem is introduced when you try to build those types of structures from tables/relations.

A trivial amount of lateral joins plus JSON aggregates will give you a relation with on record, containing a nested JSON value with a perfectly adequate tree structure, with perfectly adequate performance, in databases that support these operations.

There are solutions to these problems. One only needs to willingness to accept them.

By randomdata 2024-02-0714:23

Of course. I know reading is hard, but nobody has suggested there aren't solutions. Obviously there are, else Postgres would be pretty much unusable. But the question remains: Why resort to hacks that require extra complexity in the application when you can just as easily use the database as it was designed to be used? What are you getting in return?

By tangjurine 2024-02-0615:283 reply

> Postgres suffers from the n+1 problem, while SQLite does not.

By javcasas 2024-02-0711:32

Meaning insertion sort does not suffer from the problems that made us construct quicksort because we optimized the insertion a lot (at a tradeoff that we will not specify).

By randomdata 2024-02-0615:421 reply

Indeed.

By emseetech 2024-02-0615:51

It’s worth elaborating:

N+1 Queries Are Not A Problem With SQLite

https://www.sqlite.org/np1queryprob.html#:~:text=N%2B1%20Que....

By EasyMark 2024-02-0614:261 reply

it's much bigger and requires running a server, that's why I use sqlite3, and my needs are 99% modest most of the time.

By zilti 2024-02-0616:08

It requires running a separate process.

By neovim_btw 2024-02-0619:251 reply

Not with concurrent writes, you're not!

Even with a WAL or some kind of homegrown spooling, you're going to be limited by the rate at which one thread can ingest that data into the database.

One could always shard across multiple SQLite databases, but are you going to scale the number of shards with the number of concurrent write requests? If not, SQLite won't work. And if you do plan on this, you're in for a world of headaches instead of using a database that does concurrency on its own.

Don't get me wrong; SQLite is great for a lot of things. And I know it's nice to not have to deal with the "state" of an actual database application that needs to be running, especially if you're not an "infrastructure" team, but there's good reasons they're ubiquitous and so highly regarded.

By emseetech 2024-02-0619:30

It’s effortless to get 2-4K writes per second with SQLite on cheap commodity hardware.

That will carry most early stage applications really far.

By turnsout 2024-02-0614:236 reply

I’m with you in general, but what about vector search?

It really feels like the DB industry has taken a huge step backward from the promise of SQL. Switching from Postgres to SQLite is easy because the underlying queries are at least similar. But as soon as you introduce embeddings, every system is totally different (and often changing rapidly).

By randomdata 2024-02-0615:291 reply

Just use SQLite?

Specialized vector indexes become important when you have a large number of vectors, but the reality of software is that it is unlikely that your application will ever be used at all, let alone reach a scale where you start to hurt. Computers are really fast. You can go a long way with not-perfectly-optimized solutions.

Once you have proven that users actually want to use your product and see growth on the horizon to where optimization becomes necessary, then you can swap in a dedicated vector solution as needed, which may include using a vector plugin for SQLite. The vector databases you want to use may or may not use SQL, but the APIs are never that much different. Instead of one line of SQL to support a different implementation you might have to update 5 lines of code to use their API, but we're not exactly climbing mountains here.

Know your problem inside and out before making any technical choices, of course.

By turnsout 2024-02-0617:111 reply

You can of course use a vanilla database, read every row and just roll your own vector distance function, but it's just frustrating that there isn't a standardized pattern for this.

There are plenty of proprietary databases and APIs, but now you're taking on a dependency and assuming a certain amount of risk.

By randomdata 2024-02-0617:221 reply

> it's just frustrating that there isn't a standardized pattern for this.

Be the change you want to see, I suppose. No doubt convergence will come, but it is still early days. Six months ago, most developers didn't even know what a vector database is, let alone consider it something to add to their stack.

It took SQL well into the 1990s to fully solidify itself as "the standard" for relational querying. Even PostgreSQL itself was started under the name POSTGRES and was designed to use QUEL, only moving over to SQL much later in life when it was clear that was the way things were going. These things can take time.

By dventimi 2024-02-077:011 reply

> It took SQL well into the 1990s to fully solidify itself as "the standard" for relational querying.

IBM had SQL in their database product in 1981, Oracle had it by v4 in 1984, ANSI picked SQL as its standard that same year, and completed the first version by 1986.

Some time scientists say that the 1980s occurred before "well into the 1990s" but I mean, who can really say, right?

By randomdata 2024-02-0714:121 reply

"The standard", not "a standard". I know, reading is hard.

By dventimi 2024-02-0714:221 reply

> I know, reading is hard.

We're done here.

By randomdata 2024-02-0714:27

Done with what?

By emseetech 2024-02-0614:39

There are vector search solutions for sqlite that basically work, so if my project doesn't rely on that 100% then I'm willing to use those as stop-gaps until it does.

Of course, if there's a shortcoming of sqlite that I know I need right out of the gate, that would be a situation where I start with postgres.

By drittich 2024-02-0614:39

Vector similarity searches can be done with simple SQL statements, and while the performance will not match a vector db, it's often good enough. For a db like SQLite that can run in-memory, I suspect they would be reasonably fast.

By nycodez 2024-02-0617:25

Postgres has a vector search extension!

https://supabase.com/docs/guides/database/extensions/pgvecto...

By codingjaguar 2024-02-0615:11

SQL was introduced in 1970s. Considering vector search was only adopted in the last 5 years, I’m not surprised by the lack of standards on vector API. At Google embedding as retrieval became popular in 2019-2020.

This is the new kid in town so you would see soon all major SQL dbs will support vector. However, any serious user, O(10M) vectors or above, would still require a dedicated vector db for performance reasons.