Stream: implementers
Topic: Uniqueness of resource id in FHIR
Francesca Ricci-Tam (Sep 17 2021 at 22:19):
I would like to clarify -- do I understand correctly that for any given resource type, all resources of that type within a given FHIR-compliant EHR database will have a unique id?
Taking Condition as a concrete example: each Condition resource in Cerner's system must have a unique resource id; Condition id's are not necessarily guaranteed to be unique across different EHR's, but every other EHR system (e.g., Epic) should also have all of its Conditions with internally unique ids. Is that correct? Or can two Condition documents within the same EHR system have the same id?
Michele Mottini (Sep 17 2021 at 23:03):
That's correct
Josh Mandel (Sep 18 2021 at 02:24):
Technically speaking these IDs are unique within a given resource type within a given fhir server. Keep in mind that systems vendors like Cerner or Epic may have thousands of customers with different FHIR endpoints, and "Patient/123" will be referring to a different patient in every one of those.
Josh Mandel (Sep 18 2021 at 02:24):
It's easiest to think about the full URL of a resource (e.g., https://server.example.org/fhir/Patient/123) to see why this is true.
Christian Annel (Sep 20 2021 at 12:21):
Josh Mandel said:
It's easiest to think about the full URL of a resource (e.g., https://server.example.org/fhir/Patient/123) to see why this is true.
thank you for the response, I do realize it is a task with having different end points. Would changing the location of data be a good resolution to this? My strategy of going about this was to bulk load data into a DB(eventually implementing EDA) after translating the data from Json to Json LD, and then querying the data from the DB in graphQL.
Derek Ritz (Nov 04 2021 at 10:29):
It would seem, where there is a federation of FHIR servers, that the "original author" of a resource (ideally... the client that created it) should assign a GUID that is then faithfully persisted by the receiving server as the resource's id. In this way, content that is shared across the federation can be elegantly de-duped. I'm sure there is an intelligent role Provenance resources can play in this, too. As an example of why this could be important, collating content from a federation of FHIR servers into a patient-centric health summary document (e.g. IPS) is very difficult if id's are not globally unique. @Rob Hausam @Alex Goel
John Moehrke (Nov 04 2021 at 12:32):
the id is a unique identifier "within the assigning authority", which is the root url of the FHIR server. You can not, and must not, presume that the same id means the data are the same. You must have the same id and the same root url. -- True even if using GUID.
Grahame Grieve (Nov 04 2021 at 12:44):
somewhere in spec we mildly recommend the GUID approach, buy only mildly so; however you manage your information, you have to pay the piper one way or another when combining information from multiple sources
Derek Ritz (Nov 04 2021 at 12:44):
John Moehrke said:
the id is a unique identifier "within the assigning authority", which is the root url of the FHIR server. You can not, and must not, presume that the same id means the data are the same. You must have the same id and the same root url. -- True even if using GUID.
I'm not trying to sound stoopid, but that wouldn't be true if it was really a GUID... would it? The goal of the approach I was describing was to avoid the proliferation of uniquely identified resources in cases where there is a federation of FHIR servers, and the same content may be found on multiple servers. There are use cases where this proliferation can become very problematic. Creating a globally unique id, once, and faithfully persisting it every time a resource is copied from one server to another, can fundamentally simplify workflows that otherwise can become intractably hard.
Grahame Grieve (Nov 04 2021 at 12:46):
the same content may be found on multiple servers
well, that's very often the case, but the information was acquired on channels that don't automatically trace identity (e.g. paper form, verbal, etc) and so it might be the same content but it won't have the same identifier.
Sometimes you're in luck, and there is traceability, and it's reliable even in the face of institutional record keeping policies. And then, if all the ducks stand nicely in the row, the GUID approach really pays off. Bang - you can shoot all ducks with a single shot, and you get the prize
Grahame Grieve (Nov 04 2021 at 12:48):
in the absence of a GUID approach... someone's going to have a mapping table somewhere, and maintaining it is somwhat expensive. Only, in my experience, what's damned expensive is maintaining integrity in the face of institutional record "correction" policies, and then it turns out you need mapping tables anyway, even if everyone started with the same guid.
oh the stories I can tell! (but anyone who's worked in a reasonably large health institution has their own stories)
Derek Ritz (Nov 04 2021 at 12:50):
Grahame Grieve said:
Sometimes you're in luck, and there is traceability, and it's reliable even in the face of institutional record keeping policies. And then, if all the ducks stand nicely in the row, the GUID approach really pays off. Bang - you can shoot all ducks with a single shot, and you get the prize
It's true... sometimes we get lucky. But if we're electronically sharing data, and we persist a golden thread (the GUID) when we do, then at least in that case we'll be able to elegantly get our ducks in a row. "Luck favours the prepared"... as they say. ;-)
Derek Ritz (Nov 04 2021 at 12:51):
And yes... horror stories abound! :grinning_face_with_smiling_eyes: I guess the trick is to start doing the things that will address the root causes.
John Moehrke (Nov 04 2021 at 13:12):
GUID does not have absolute guarantee of never having two identical values ever created. Yes the chances are impossibly high, but they do exist. Within one FHIR server that chance is addressed by the create operation noticing that the GUID that it created happens to already exist in the database, so another GUID is created. Thus one can only be sure that GUID is unique within one FHIR server. (Note the FHIR server does need to have been created with good systems-design principles, one I just mentioned).
John Moehrke (Nov 04 2021 at 13:12):
assuming that GUIDs are always globally unique is a major patient safety risk.
Derek Ritz (Nov 04 2021 at 13:46):
John Moehrke said:
assuming that GUIDs are always globally unique is a major patient safety risk.
"You keep using that word [GUID]... I don't think it means what you think it means". :grinning_face_with_smiling_eyes:
https://www.youtube.com/watch?v=dTRKCXC0JFg
Derek Ritz (Nov 04 2021 at 13:55):
https://datatracker.ietf.org/doc/html/rfc4122
David Pyke (Nov 04 2021 at 13:57):
I like how you assume people are actually following the RFC instead of assigning them based on some internal policy
John Moehrke (Nov 04 2021 at 14:16):
@Derek Ritz See third paragraph of "Security Considerations" - https://datatracker.ietf.org/doc/html/rfc4122#section-6
Derek Ritz (Nov 04 2021 at 16:39):
Sorry... but I don't believe I am making a rash assumption when I assume that GUID means globally unique ID. There is a mature standard for how these are to be created and this spec must be followed for the resulting artefacts to actually be considered GUIDs. RE: para-3 of sec-6... I'm sure that was good advice in 2005 (and it still is one way to go)... and I'm equally sure there are GUIDs being reliably generated all day, every day, all over. The issue at hand is that there is a FHIR resource id proliferation problem in federated server environments, and GUIDs are one way to address this issue. I'm sure there are other ways, too...
Elliot Silver (Nov 04 2021 at 16:44):
If I recall correctly, even for good implementations, there is a 1 in 100 million chance of a collision with 100 million GUIDs. Unfortunately I can't find a reference for that right now.
(Edit: Wikipedia suggests 1 in 1 billion for 100 trillion GUIDs.)
Vassil Peytchev (Nov 04 2021 at 16:56):
It seems to me that this discussion has degenerated into trying to apply a business meaning to a purely technical element. Whether a resource id is globally unique or not, the moment a resource is instantiated on a different FHIR server it becomes a different instance. If there is a business need to establish that two instances of a resource contain information about the same business entity (same patient, same result, etc.), then you must use an identifier for that purpose. Anything else is fraught with risks.
This is not to say that the issue of federated FHIR servers is not important or easy to solve - on the contrary. I am just pointing out that Resource.id is not the way to solve it.
Derek Ritz (Nov 04 2021 at 17:27):
Vassil Peytchev said:
the moment a resource is instantiated on a different FHIR server it becomes a different instance.
If it is a copy... is it not useful that this copy will have the same GUID as the "original"? So may things we need to do with FHIR resources are operationalized by referencing the id. Across a federation of FHIR servers, the very chatty process of searching and fetching by identifier... then by id (once the server-assigned patient.id is known, for example)... seems ripe for being re-engineered. I live in hope for a better way...
John Moehrke (Nov 04 2021 at 17:44):
no.
As Vassil indicates the copy can/should/shall have a .identifier that holds the original resource .id and system. But that copy is a copy and takes on a life of its-own. Most copies should/shall/may never change, but they are still copies that have a life of their own.
also, a Provenance is really good to have to show the derivation.
Derek Ritz (Nov 08 2021 at 17:09):
There are some very important issues that need to be solved. The very unhelpful behaviour of a federation of FHIR servers -- that are all assigning different id's to the identical copies they may have of a single resource -- is one of these important issues. I fear that the idea of saving the original .id in an .identifier element will be a very poor performer, in practice (just too chatty... a simple sequence diagram of the traffic pattern illustrates this). Has anyone tried to address "proliferation of duplicate resources across a federation of FHIR servers" problem... and found a good solution?
Elliot Silver (Nov 08 2021 at 18:09):
Uniqueness of ids for identical copies of resources is, in my opinion, one of the minor challenges with federation of FHIR servers.
Vassil Peytchev (Nov 08 2021 at 18:40):
There is no such thing as identical copies of resources in this case. If it is on a different server, it is not identical. it may be equivalent, but not identical (and then you have the identical identifier to establish the equivalence).
Lloyd McKenzie (Nov 15 2021 at 02:56):
It's equivalent to "identical copies of the same information in different databases have distinct primary keys" - which is something we've lived with for many decades and is generally not perceived to be a problem...
Derek Ritz (Nov 17 2021 at 14:08):
@Lloyd McKenzie -- this issue (and solution) we've had for decades is exactly what i'm trying to address, here. across a federation of multiple production databases... if the primary key of the record is the same, you can tell it's the same data record, and de-duping is simple (e.g. SELECT UNIQUE...). i think you're conflating rowid and primary key. across different databases, the rowid would be different... but the primary key would be the same. but... across a federation of FHIR servers... it seems our spec is to have each FHIR server use rowid as the primary key... and to figure out which ones are the same is left as an exercise for the reader.
Lloyd McKenzie (Nov 17 2021 at 16:19):
In most databases, the primary key is the rowId. The resource.id definitely corresponds to rowId. It's not supposed to be meaningful. It's not intended for use as anything other than to allow linking between resources, in the same way that a relational database links to rows in other tables.
Lloyd McKenzie (Nov 17 2021 at 16:20):
There's zero expectation that rowIds or resource.id values would be the same on different systems unless they are synchromized databases
Vassil Peytchev (Nov 17 2021 at 16:27):
To bring back this from the other related thread:
Across a federation of FHIR servers it can be important to know which resources are "copies" of the same underlying "base" resource.
I don't think anyone is arguing against that. What the difference seems to be is that the way to achieve the above is for the federation to define a particular identifier system, and have the same identifier from this system be the primary key.
In other words GET <FHIR base>/Resource?identifier=<federated system URI>|2345
will give you the identical copy of the resource.
Derek Ritz (Nov 17 2021 at 18:11):
@Vassil Peytchev -- this sounds wonderful! I didn't realize there was a single-statement way to get all the health data about a particular care subject using the patient.identifier, alone. I was under the (mistaken) impression that, for each FHIR server in the federation, I had to get the patient.id by querying using the patient.identifier... and then separately query for the health content associated with that patient.id (by querying by .id, now that I have it). To confirm... I can get the whole person-centric health "story" without all that round-tripping? :-)
John Moehrke (Nov 17 2021 at 18:36):
as long as the FHIR server supports this.... it is one of the features in fhir core... and like all features, none are mandatory. An implementation guide can mandate it. I have seen some regions with national identifiers do this.
Derek Ritz (Nov 17 2021 at 20:12):
@John Moehrke we had a lot of challenges testing IPS at the EU IHE Connectathon... and they seemed to be related to this exact problem. Am I not remembering correctly what the issue was?
John Moehrke (Nov 17 2021 at 21:01):
I was not involved. Would love to help
Grahame Grieve (Nov 18 2021 at 00:15):
I was under the (mistaken) impression that, for each FHIR server in the federation, I had to get the patient.id by querying using the patient.identifier... and then separately query for the health content associated with that patient.id (by querying by .id, now that I have it). To confirm... I can get the whole person-centric health "story" without all that round-tripping?
you've conflated to things - searching by identifier, and multiple servers. But @Vassil Peytchev's example should be understood as a template, not an actual equivalent to the $everything which is what it sounds like your are thinking it is. But IPS is a much more bounded question. Although it really sounds like you are asking about IPA not IPS
Lloyd McKenzie (Nov 18 2021 at 16:13):
A lot of EHRs require you to first undertake patient resolution (which will give you a Patient.id) and then get authorization and perform all other queries using Patient.id. Cross-resource searching using Patient.identifier is often not supported (though it's completely legal).
A search that can work across resources is GET [base]?patient.identifier=<someSystem>|123
, but definitely not something you'll find widely supported. (Note that that'll give you all patient-centric resources except the Patient.)
Peter Jordan (Nov 19 2021 at 19:24):
WRT to IPS requests - identifier is one of the input parameters to the nascent $summary operation. I also believe that it will be one of the primary means of facilitating searching from patient-facing apps.
In NZ, we have 30 years experience of using a national patient identifier (NHI) in EHR systems and HIE services. Based on that experience, I strongly recommend implementers here NOT to use it as the ID of a Patient Resource however convenient that may be for them in the short-term - for the same reasons that it is not a suitable candidate for a primary key in a relational database.
Craig McClendon (Nov 19 2021 at 21:47):
Moreover @Peter Jordan - Anonymization/Deidentification is a common task for health data to support downstream research. If you are using a business identifier as Patient.id deidentification becomes extremely difficult.
Derek Ritz (Nov 23 2021 at 23:16):
I agree with @Peter Jordan... the business identifier should not be the unique patient.id. I wasn't contemplating that it would be. What I was contemplating was that a meaningless but unique ID for a resource... a GUID... would be consistent used across a federation of FHIR servers that all had copies of this resource. The obvious use case is for replicated patient resources... but I think it could apply for any other resource that is copied to multiple servers... couldn't it?
Grahame Grieve (Nov 24 2021 at 00:05):
sure. you can use UUIDs for that. and the standard says you can and mentions that as a usecase.
Vassil Peytchev (Nov 24 2021 at 02:56):
Still, http://server1/Resource/uuid1
and http://server2/Resource/uuid1
need to be considered different instances of Resource, unless there is additional information to indicate that they are the same resource replicated.
In general, I think that the information about replication of resources and federation of servers is better handled at lower level of the network stack. For the cases when this information has to bubble up to level 7, it fits best in canonical urls for canonical resources, and identifiers for the rest...
John Moehrke (Nov 24 2021 at 13:38):
Grahame Grieve said:
sure. you can use UUIDs for that. and the standard says you can and mentions that as a usecase.
The dissonance in the discussion is not about UUIDs, but rather the element .id. As Vassil and I have said multiple times, a UUID as a .id on server1 can not be assumed/presumed/compelled to be the same resource .id on server2.
One could put the server1 resource .id value into the instance copied onto server2 in that server2 instance .identifier element.
Lloyd McKenzie (Nov 25 2021 at 03:30):
Servers can coordinate to use the same resource.id values for the same records - in much the same way as they can use the same database row ids to synchronize databases - and the use-cases are much the same. It requires a controlled approach to identity management and pre-arrangement with the respective servers. Completely possible where planned for and designed, but definitely not something that will be typical or happen automatically.
Grahame Grieve (Nov 25 2021 at 03:35):
fyi https://www.ctl.io/developers/blog/post/server-generated-keys-unique-ids-for-distributed-databases
Last updated: Apr 12 2022 at 19:14 UTC