Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've known for a long time that you usually want b-tree in Postgres/MySQL, but never understood too well how those actually work. This is the best explanation so far.

Also, for some reason there have been lots of HN articles incorrectly advising people to use uuid4 or v7 PKs with Postgres. Somehow this is the first time I've seen one say to just use serial.



> incorrectly advising people to use uuid4 or v7 PKs with Postgres

random UUIDs vs time-based UUIDs vs sequential integers has too many trade-offs and subtleties to call one of the options "incorrect" like you're doing here.

just as one example, any "just use serial everywhere" recommendation should mention the German tank problem [0] and its possible modern-day implications.

for example, if you're running a online shopping website, sequential order IDs means that anyone who places two orders is able to infer how many orders your website is processing over time. business people usually don't like leaking that information to competitors. telling them the technical justification of "it saves 8 bytes per order" is unlikely to sway them.

0: https://en.wikipedia.org/wiki/German_tank_problem


PK isn't the same as public ID, even though you could make them the same. Normally you have a uuid4 or whatever as the public one to look up, but all the internal joins etc use the serial PKs.


> Normally you have a uuid4 or whatever as the public one to look up, but all the internal joins etc use the serial PKs.

what? that's possible, but it's the worst of both worlds. I've certainly never encountered a system where that's the "normal" practice.

the usual reason people avoid UUIDv4 primary keys is that it causes writes to be distributed across the entire B-tree, whereas sequential (or UUIDv7) concentrates them.

but if you then add a "alternate primary key" you're just re-creating the problem - the B-tree for that unique index will have its writes distributed at random.

if you need a UUID PK...just use it as the PK.


The problem isn't so much the writes, it's the reads. Every time you join tables, you're using a PK 2-4x the size it needs to be, and at least that much slower. Even filtering on a secondary index may involve an internal lookup via PK to the main table. It doesn't take long to start noticing the performance difference.

Since you'd have a secondary index for the public UUID, yes that one index suffers from the random-writes issue still, but it takes a lot of volume to notice. If it ever is a big deal, you can use a separate KV store for it. But if you picked UUID as the PK, it's harder to get away from it.


Well that and all the tables you have that don't need a customer-facing ID at all, in which case you also benefit from quicker writes using serial

DB perf considerations aside, a lot of software pattern around idempotency/safe retries/horiz-scaling/distributed systems are super awkward with a serial pk because you don’t have any kind of unambiguous unique record identifier until after the DB write succeeds.

DB itself is “distributed” in that it’s running outside the services own memory in 99% of cases, in complex systems the actual DB write may be buried under multiple layers of service indirection across multiple hosts. Trying to design that correctly while also dealing with pre-write/post-write split on record id is a nightmare.


DB sequence will give you a unique ID before the transaction succeeds. If the transaction fails, there's just a gap in the IDs.

If some service that doesn't interact with the DB wants to define its own IDs, sure, but even then whatever writes to the DB can always remap that to serial IDs. I know there are use cases where that still doesn't make sense and you really need UUID PKs, but it's not the norm.


Simple sequential IDs are great. If you want UUID, v7 is the way to go since it maintains sequential ordering.


There are subtle gotchas around sequential UUID compared to serial depending on where you generate the UUIDs. You can kinda only get hard sequential guarantee if you are generating them at write time on DB host itself.

But, for both Serial & db-gen’d sequential UUID you can still encounter transaction commit order surprises. I think software relying on sequential records should use some mechanism other than Id/PK to determine it. I’ve personally encountered extremely subtle bugs related to transaction commit order and sequential Id assumptions multiple times.


Does all of that apply to Postgresql as well or only Mysql?


Both, assuming you’re ever going to index it - both use a form of a B+tree for their base indices.

If it’s just being stored in the table, it doesn’t matter, but also if it doesn’t matter, just use v7.


I haven't explored hash indexes enough. That might be one thing that differs between Postgres and MySQL, because for a long time Postgres didn't have a good story for those.

> just use serial

Ideally you use IDENTITY with Postgres, but the end result is the same, yes.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: