Stream: implementers
Topic: big int vs GUID for Primary Keys
Shane O'Neill (Sep 04 2019 at 20:59):
Howdy
in a relational DBMS, are there any advantages to using GUIDs vs the native primary key (big int) implementation to identify FHIR resources?
Lloyd McKenzie (Sep 05 2019 at 01:49):
Only benefit is if you want to use PUT creates on a server where lots of other servers are also performing creates. With GUIDs you know there won't be collisions. With big int ids, there can be. If you're not in that situation, no benefit.
James Agnew (Sep 05 2019 at 09:39):
GUIDs could also have a minor privacy benefit, in that you can't infer anything about the order that resources were created by looking at their IDs. I can't think of a scenario where that would be an exploitable fact, but that doesn't mean there aren't any..
Adam Flinton (Sep 05 2019 at 10:02):
I would use guids for a number of reasons:
Clearly don't know your usecases nor db choice etc but 3 quick reasons (out of a greater number)
1) Often for reasons of scalability (or lack thereof) you might have identical db'es in a n customer orgs (e.g. gp surgeries or hospitals) per db arrangement. If the scalability then improves and you want to move to one db period or even simply more customer orgs per db then you will have a problem of identical keys with ints.
2) If using a variety of sharded db options guids are more random and thus ease the hot partition problem
3) Because of the growing prevalence of the use of guids as keys the indexing around them has improved such that there is little if any benefit any more to using ints (admittedly there did use to be & possibly in your choice of db there still might be)
John Silva (Sep 05 2019 at 13:16):
@Shane O'Neill -- one disadvantage of using GUIDs is that they do not necessarily work well as primary DB keys. I'm not a 'DB guy' but I remember a DB architect talking about this problem from a performance and space perspective.
Jenni Syed (Sep 05 2019 at 15:31):
The issue typically comes in if the DB in question doesn't have a first class GUID/UUID type and thus dumps these into strings. And indexing off those strings for what would have been a "unique id" lookup tends to be slower, from what I understand
Gino Canessa (Sep 05 2019 at 15:50):
In MSSQL Server (relational I'm most comfortable with), uniqueidentifier is a 128 bit binary value. Performance is no different from other binary types, but it's larger than an integer type (e.g., 64 bit bigint). That said, I'm now of the 'storage is cheap' camp for things like this - I'd rather inflate by the extra 64 bits and not worry about collisions when I consolidate databases.
James Agnew (Sep 05 2019 at 21:57):
One other point in favour of GUIDs: Anytime you use a sequence to generate IDs, sooner or later an application restart or rolled back transaction causes a gap in your IDs, and even though this gap makes absolutely no meaningful difference to anyone or anything, it's a guarantee that some bright-eyed analyst will notice and freak out on the developers that there are gaps. Use GUIDs and the issue goes away..... ;)
Peter Jordan (Sep 05 2019 at 22:27):
GUIDs are good candidates for primary keys in modern RDBMS, providing they are not used as clustered indexes. From an RDBMS perspective (certainly in MSSQLS) they are required to perform replication. They are also used as identifiers in SNOMED CT Reference Sets.
Last updated: Apr 12 2022 at 19:14 UTC