YAGRI: You are gonna read it

2025-04-2321:47329156www.scottantipa.com

April, 2025 YAGNI, or, You aren't gonna need it, is a standard piece of advice that warns against over engineering and building too many features too early. I think its great and saves you from…

Show article

April, 2025

YAGNI, or, You aren't gonna need it, is a standard piece of advice that warns against over engineering and building too many features too early. I think its great and saves you from wasting time, which can kill a project.

However, there's an exception that I call YAGRI, or, "You are gonna read it". It means that you shouldn't just store the minimum data required to satisfy the current product spec. You should also store data that you'll highly likely use (read), such as timestamps and contextual metadata.

This problem tends to happen when a UI design shows that you only need to display a few specific bits of data to the user, so you only store those exact fields in the database. You've satisfied the design and ship it. Then later you realize you're missing valuable information to help debug an issue, do internal analytics, etc.

As an example, this commonly occurs when implementing a feature to let users delete something. The easy way is to just delete the row from the database, and maybe that's all that the current UI design call for. In this situation, regardless of the requested feature set, as engineers we should maintain good data standards and store:

who deleted it
how they deleted it (with what permission)
when
why (surrounding context, if possible)

In general, these are some useful fields to store on almost any table:

created_at
updated_at
deleted_at (soft deletes)
created_by etc
permission used during CRUD

This practice will pay off with just a single instance of your boss popping into a meeting and going "wait do we know why that thing was deleted, the customer is worried...".

However, not every one of these fields that you store will end up serving a purpose. But maybe just a single field on a single table will save you one day, and that makes up for the costs of implementing a dozen others. Most of the apps we build, at the end of the day, are about storing data to keep track of facts. It's quite possibly your most important job as an engineer to steward and maintain this data.

Of course you can go too far in the other direction. You shouldnt just log everything. But I've never heard someone complain about a table having too many timestamps.

Read the original article

Comments

By crazygringo 2025-04-242:559 reply

These are not decisions that should be taken solely by whoever is programming the backend.

They need to be surfaced to the product owner to decide. There may very well be reasons pieces of data should not be stored. And all of this adds complexity, more things to go wrong.

If the product owner wants to start tracking every change and by who, that can completely change your database requirements.

So have that conversation properly. Then decide it's either not worth it and don't add any of these "extra" fields you "might" need, or decide it is and fully spec it out and how much additional time and effort it will be to do it as a proper feature. But don't do it as some half-built just-in-case "favor" to a future programmer who may very well have to rip it out.

On a personal project, do whatever you want. But on something professional, this stuff needs to be specced out and accounted for. This isn't a programming decision, it's a product decision.

By hamandcheese 2025-04-244:515 reply

Some things are trivial and nearly free - created_at, updated_at. I don't think engineers need to bring trivialities like this to a "product owner". Own your craft.

By hombre_fatal 2025-04-2410:43

I never worked at a place with product owners, but their post made me appreciate my roles where I'm trusted to help design the product myself. Yeesh.

Being unable to even call the shot of whether a database table should have an updated_at or soft-delete sounds like a Dilbertian hellscape to me.

By bayindirh 2025-04-2410:441 reply

When the product you're developing is governed by regulations and standards you need to comply, owning your craft is doing things by the book, not adding fields on your own because it might be useful later.

By danielPort9 2025-04-2410:38

I think the tricky part lies on knowing which things can be done without consulting any product owner. I agree that created_at and updated_at don’t cause any harm. deleted_at on the other hand cannot be decided by engineers only (mainly because of GDPR reasons: if something is expected to be totally deleted, then that must be it). As usual, these kind of things are obvious to engineers with years of experience , not so much to newcomers.

By kace91 2025-04-246:461 reply

A soft delete might not be, for compliance reasons (GDPR and the like). Otherwise I agree.

By IanCal 2025-04-2410:40

Although those can be more complicated, and it should be clear what they're for and why they exist. Will this result in an object having an updated_at timestamp elsewhere in a larger application? Is it clear which properties that refers to?

By Extasia785 2025-04-2410:562 reply

This entirely depends on the company culture. I worked in teams where every small decision is in the hand of the PO and I've worked in teams where a software engineer is a respected professional enabled to make their own technical decisions. I found the second option to create higher quality software faster.

Also not sure what you mean by additional effort? Created_at, updated_at or soft-deletes are part of most proper frameworks. In Spring all you need is an annotation, I've been using those in major projects and implementation cost is around a few seconds with so far zero seconds of maintenance effort in years of development. At least those fields are solved problems.

By crazygringo 2025-04-2416:281 reply

But what if it's not a technical decision? What if there are legal implications around data retention that it's not your job to be aware of?

I've been parts of teams where features had to be totally thrown out and rebuilt because developers made big assumptions that turned out to be wrong, because they didn't think it was worth it to check with the product owner. Because they assumed it was only a "technical decision", or they assumed they understood the customer needs despite never actually asking the customer.

This doesn't mean checking with product around each line of your code, obviously. But deciding what information gets stored in the database, what level of event tracking you do, whether deletes are hard or soft -- these have massive product implications, and potentially legal ones.

And it is additional effort. Now you have to write tests for all those things. Are the timestamps being stored correctly? Are the permission bits being stored correctly? Is "created_by" coming from the right user? Are we sure a malicious user can't spoof that? Do we care? Is "updated_at" actually being updated on every row change? But are we making sure "updated_at" is not getting changed when we import data from a separate table? How often do we remove soft-deleted data in order to comply with privacy policies and regulations, and with what cron job, and who maintains that? Where do alerts go if the cron job fails? What happens if that employee leaves? I could go on and on and on.

So that's what I mean by additional effort. It's not "around a few seconds". Because it's not just a technical question, it's a product one. It's a proper feature that needs to be properly defined and properly scoped out and properly tested.

By exe34 2025-04-2411:15

even better, assume soft deletes and inform them this is how delete works - if there's a hard requirement for hard delete, they will tell you.

the hypothetical future programmer is you in two weeks.

By makeitdouble 2025-04-2416:41

> And all of this adds complexity, more things to go wrong.

This argument is one of the reason why a backend engineer could just add the created_at and updated_at fields without asking a product owner.

It doesn't make much sense from the pure product perspective, so the standard answer will be "no, let's add them when we have a real case they're needed". The product I'm inheriting right now misses these fields on half of the tables. Except when you really want the data, it won't be there as you're not going back in time.

Trying to convince someone that it's worth it will also give the impression they're optional, when you already decided you need them. So at the end of the day, it's your responsibility as an engineer to do what's required, without punting it to non technical stakeholders to push your back.

I also wouldn't ask a product manager if they think table schema should be orthogonal.

Now keeping or not IPs or user IDs in a table is a whole different story and requires a lot more consulting, and not just with the PO.

By chii 2025-04-246:07

> don't do it as some half-built just-in-case "favor" to a future programmer who may very well have to rip it out.

in other words - YAGNI !

By Chris2048 2025-04-244:57

> And all of this adds complexity, more things to go wrong

That's a little vague given this specific example, which appears to be about maintaining some form of informative logging; though I don't think it necessarily needs to be in the form of an DB table.

By PaulHoule 2025-04-2414:11

I've seen product owners who get blindsided every time by this sort of thing.

On the other hand, in some shops there is a dedicated DBA who is in charge of database schemas and possibly everything else. Before it became fashionable to create a "service layer" where people access the database (now database(s)) throw web endpoints, some organizations would put all the database access into a set of stored procedures managed by the DBA. Maybe that's extreme, but in the real world product owners come and go but the database is forever and deserves to have somebody speaking out for its interests.

By Cthulhu_ 2025-04-249:19

A product owner may (probably does not) have these things on their radar, it's up to a developer to inform them of industry best practices like these.

By escot 2025-04-2418:04

In a good work environment you’ll have some telepathy with your product people and you wont have to communicate every minor change, which wastes time. Its similar to implementing a UI design — you and the designer will develop a relationship where you dont need to communicate every pixel, you fill in the blanks and show results, repeat.

By damnitbuilds 2025-04-248:47

Yes. Adding fields to a DB is not a risk-free thing a programmer should just do.

By KronisLV 2025-04-2410:428 reply

I don't get why all of the big RDBMSes (PostgreSQL, MariaDB/MySQL, SQL Server, Oracle, ...) don't seem to have built in support for soft deletes up front and center?

  CREATE TABLE ... WITH SOFT DELETES

Where the regular DELETE wouldn't get rid of the data for real but rather you could query the deleted records as well, probably have timestamps for everything as a built in low level feature, vs having to handle this with a bunch of ORMs and having to remember to put AND deleted_at IS NULL in all of your custom views.

If we like to talk about in-database processing so much, why don't we just put the actual common features in the DB, so that toggling them on or off doesn't take a bunch of code changes in app, or that you'd even be able to add soft deletes to any legacy app that knows nothing of the concept, on a per table basis or whatever.

By moritzwarhier 2025-04-2412:352 reply

Maybe my intuition is wrong, but to me this sounds like a violation of the principle of least power.

"Soft deletes" is just a name for a regular write operation, with specific semantics.

Adding a layer of magic to the DB for this doesn't seem right to me.

And applications could have many different requirements for soft deletes, like the article points out. For example, the simplest version would be just a boolean "deleted" column, but it could also be "deleted_at", "deleted_by", etc.

All of these cases require an bunch of code changes anyway, and the more complex ones could interfere with an implementation of this feature at the database level: such a transparent implementation couldn't access app-specific concerns such as user data, for example.

Adding soft deletes to a legacy app that knows nothing about it would only work for a boolean flag and a maybe date-time value, unless the DBMS would also offer triggers for soft deletes etc?

Seems to me to that this capability would make a DBMS much more complicated.

By Spivak 2025-04-2413:041 reply

I don't think it's all magic because you have to implement soft deletes in your application layer and it would be nice to have a little DB support for it. It doesn't have to be some big thing, just the ability for selects and such to work only on undeleted elements by default unless you ask for them would be nice so it doesn't pollute your code and make you have to always remember to point the gun away from your foot.

By hnthrow90348765 2025-04-2413:51

>Adding soft deletes to a legacy app

As an aside, I've never found this to be worth it since you have to change too much and re-test everything for minimal user benefit and time savings. The effort is way worse if the code is not great in the first place. It can be a great decision to make before everything is written.

Maybe it's worth it for files which are hard to reproduce, but you can also rely on DB backups to get those back. If people are regularly deleting things they're not supposed to, you're better off removing the user-facing delete actions, limiting the action to specific users, etc.

By perrygeo 2025-04-2419:03

The whole model of RDBMS is based on mutable tuples; soft deletes don't make much sense as an intrinsic part of that model. If you want soft deletes, you create an application layer or use a different data model.

Most of the time if you want "soft deletes", you really want an immutable log so that you time travel to any point in the history. XTDB and Datomic are worth looking at if you want to solve the problem at the data model level.

By crazygringo 2025-04-2413:38

Because it's too dependent on business logic.

Different products will handle soft deletes differently. Which queries need to include soft-deleted rows and which don't? What about different levels of soft deletes, e.g. done by the user (can be undone by user) vs. done by an admin (can't be undone by user)?

Implementing soft deletes yourself isn't hard. Yes you'll have to make a bunch of decisions about how they work in every circumstance, but that's the point.

By hypeatei 2025-04-2412:473 reply

Temporal tables in SQL server fit this use-case[0], I think.

0: https://learn.microsoft.com/en-us/sql/relational-databases/t...

By mastax 2025-04-2418:59

Also System-Versioned Tables in MariaDB: https://mariadb.com/kb/en/system-versioned-tables/

By ajuc 2025-04-2416:081 reply

Oracle has flashback queries (SELECT ... AS OF timestamp).

It's one of these things that are available but most people ignore it and implement it manually with created_at updated_at deleted_at columns etc. I think one reason for this is lack of support in ORMs and lack of standardization between RDBMSes.

By rjbwork 2025-04-2413:521 reply

Available on postgres as an extension. It's a bit jank and doesn't have language integrated clauses like sql server.

By marcosdumay 2025-04-2414:23

It's just not bothersome enough to deviate from the standard.

If they did this, nobody would use it. They do lots of more useful things that people don't use because it's not portable.

There's a sibling comment about temporal databases. Those solve a very bothersome problem, so a few people use them. That means that there's a chance soft deletes get adopted as a side effect of a much more complex standard.

By johnthescott 2025-04-257:41

old problem. in postgresql many ways already exist to keep deleted data hanging around, not the least of which is explicity archiving the delete transaction in an archive table. for legacy code that can not be changed: triggers, row level security, tailing logical replication log.

By ajuc 2025-04-2416:06

Oracle has this already. SELECT ... AS OF timestamp.

It needs to be enabled of course and it's not free.

By alganet 2025-04-242:033 reply

*_at and *_by fields in SQL are just denormalization + pruning patterns consolidated, right?

Do the long walk:

Make the schema fully auditable (one record per edit) and the tables normalized (it will feel weird). Then suffer with it, discover that normalization leads to performance decrease.

Then discover that pruned auditing records is a good middle ground. Just the last edit and by whom is often enough (ominous foreshadowing).

Fail miserably by discovering that a single missing auditing record can cost a lot.

Blame database engines for making you choose. Adopt an experimental database with full auditing history. Maybe do incremental backups. Maybe both, since you have grown paranoid by now.

Discover that it is not enough again. Find that no silver bullet exists for auditing.

Now you can make a conscious choice about it. Then you won't need acronyms to remember stuff!

By lud_lite 2025-04-243:061 reply

Another option is audit info could go to another table or datastore entirely.

If you never use it, that data can be dumped to s3 glacier periodically (e.g. after 90 days).

By losing the foreign key you gain flexibility in what you audit. Maybe audit the operation and not the 20 writes it causes.

By yes_man 2025-04-247:041 reply

Fair enough, but now your application is relying on 100% uptime of AWS and S3 and no network failures in between. And what happens if your transaction goes through, but the request to AWS doesn’t? What happens if another operation mutates the target meanwhile before you can retry with current state? Your app is also slowing down since it needs to send the events to S3 and guarantee they got there. Now you are reinventing two-stage commits. Unless you aren’t actually making an audit log and don’t care if events are guaranteed to be logged?

So like OP said, no silver bullets exist for auditing.

By awesome_dude 2025-04-243:391 reply

But wait, there's Event Driven Architectures and Event Sourcing, meaning that the events are your log of edits!

By alganet 2025-04-243:551 reply

Doesn't that also falls on the "blame the database engines and go for an experimental solution"?

I'm not saying databases are blameless. It's just that experiencing the issues they have by yourself is rewarding!

There is also a walk before the long walk of databases. Store things in text files and use basic tools (cat, sed, sh...).

The event driven stuff (like Kafka) reminds me of that. I am not very familiar with it though, just played a little bit with it once or twice.

By klysm 2025-04-242:123 reply

My current state is have the database be the current state and use logical replication (CDC) to keep the log of changes in case you need it

By datadrivenangel 2025-04-242:331 reply

It is interesting thinking about record changes as a spectrum towards application logs. At some point too much detail is expensive to store, and you must adopt an archival strategy.

By globular-toast 2025-04-246:55

The problem with this is the audit log is only at the CRUD level which is often too low. Ambiguities can arise. For example if the question is "who published the article" do you look for a create or do you look for an update with published=true? It's even worse when you consider the schema can change over time, so both can be correct but at different points in time. Event sourcing is the way if you want to capture business-level events.

By alganet 2025-04-242:361 reply

If you see it from the pure SQL point of view, you are in the "blame database engines and adopt an experimental solution".

It is the point where you give up modeling the audit as part of the systems tables.

The drawbacks of this choice are often related to retrieval. It depends on the engine.

I once maintained a system that kept a fully working log replicated instance delayed by 24h, ready for retrieval queries, in addition to regular disk backups (slow costy retrieval).

I am more developer than DBA, so I can probably speak more about modeling solutions than infra-centric solutions.