FHIR Chat · Resource.id restrictions

Stream: implementers

Topic: Resource.id restrictions

Reinhard Egelkraut (Jul 24 2018 at 12:45):

Hello everyone!

During discussions in the FHIR work group of HL7 Austria a problem was made aware concerning the handling of resource IDs, which affects multiple vendors:
Some vendors are currently trying to implement a FHIR interface connected to a backend systems e.g. an IHE XDS affinity domain. In the FHIR standard an ID of a Resource is restricted to 64 characters, including alpha numerics,-,. . While this works great for database IDs it is not really suitable for scenarios in which a data storage for FHIR resources does not exist and the only thing available as IDs are the identifiers (e.g. patient ID, document ID) consisting of an assigning authority and a value as it is quite often used in IHE profiles (PIX, XDS,...). Please note that although this problem did occur first during the implementation of IHE Profiles with FHIR it is not limited to it.
This requires the implementation of such a FHIR interface to build a mapping table by reassigning new IDs and storing them in a separate data pool. This makes the whole implementation hard to maintain, complicated and not scalable. Is it possible to remove or extend the length restriction?

Thanks
@Patrick Mangesius , @jb - fyi

Grahame Grieve (Jul 24 2018 at 12:49):

we're pretty keen not to extend the id beyond 64 characters. I would've though 64 characters was plenty long enough for hashing schemes etc

jb (Jul 24 2018 at 13:07):

thanks, grahame! what exactly do you mean when you mention hashing schemes? e.g. in our case, a patient id may consist of an oid (theoretically unlimited in length) as assigner, some sort of separator and an uuid value (limited to 36characters). this could very easily extend beyond 64 characters.

Grahame Grieve (Jul 24 2018 at 13:13):

so you can hash all that to a long hash with exceedingly low clash rates.( on par with UUIDs). Or you can hash the OID part to a standard 10 character hash... or you can have a maintenance table that assigns OIDs a key....

Grahame Grieve (Jul 24 2018 at 13:13):

no need to impose >64 chars on everyone....

John Moehrke (Jul 24 2018 at 13:18):

They need to be able to take an ID, and figure out what that means in their backend. This is not something that a hash can help with. A hash is a oneway translation, they need the other direction.

Grahame Grieve (Jul 24 2018 at 13:20):

well, you can add a key to the table that is a hash. Or one of the other solutions....

jb (Jul 24 2018 at 13:20):

Thanks, John. That's exactly the problem. Other than that, I'm not quite sure why a longer id would "impose" something. Can't imagine how this would lead to a larger implementation effort for anyone.

jb (Jul 24 2018 at 13:21):

Well, maintaining a tables sounds easy, but imposes scalability problems which have to be solved (e.g. think about clustering)

Grahame Grieve (Jul 24 2018 at 13:22):

if we say that ids can be longer than 64 characters, many servers will have to make bigger fields and indexes. I don't see why that's necessary

Grahame Grieve (Jul 24 2018 at 13:22):

if you have a table, you can have a key

John Moehrke (Jul 24 2018 at 13:24):

as indicated, this is not unique to the IHE MHD profiles.. but we can use them as an example of the problem. The _id would thus need to contain: homeCommunityID + RepositoryID + DocumentUUID. I am sure Imaging has other examples. Likely other use-cases too. A table is the solution they are forced to maintain, and it becomes fragile and full of useless information. Yes one could have expiration of table entries, and I did warn IHE readers (consumer actors) that they should not expect any DocumentReference to be persistent beyond a reasonable amount of time. --- So, is there some increase in size that seems justifiable but still reasonable?

Grahame Grieve (Jul 24 2018 at 13:27):

not really, because people will keep coming across ever longer composite keys that are monstrous UUID things

Grahame Grieve (Jul 24 2018 at 13:27):

if the document has a UUID, than a UUID is enough, no?

jb (Jul 24 2018 at 13:27):

well, in xds you will need at least to know the repository id...

Grahame Grieve (Jul 24 2018 at 13:28):

I try to be sympathatic where I can, but 64 chars is a lot, backed by a ISO std, and changing it is a long piece of string for everyone to deal with - ids are indexed values

John Moehrke (Jul 24 2018 at 13:29):

It is enough to do a lookup in a local table. It is actually enough to do an XDS GetDocuments... just looking for a solution that does not involve an intermediate lookup.

Grahame Grieve (Jul 24 2018 at 13:29):

and why do you need to know the repository id?

John Moehrke (Jul 24 2018 at 13:30):

realistically, they are trying to stuff more than minimally necessary into it...

John Moehrke (Jul 24 2018 at 13:31):

limits like this will cause them to think harder... I don't mind that.... Just supporting the question needing to be asked.

jb (Jul 24 2018 at 13:33):

because in MHD we're trying to map to xds operations

jb (Jul 24 2018 at 13:33):

We approached iti-68 as a iti 43

jb (Jul 24 2018 at 13:33):

https://www.ihe.net/uploadedFiles/Documents/ITI/IHE_ITI_TF_Vol2b.pdf - >3.43.4.1.2

jb (Jul 24 2018 at 13:33):

"A required repositoryUniqueId that identifies the repository from which the document is
4560 to be retrieved. This value corresponds to XDSDocumentEntry.repositoryUniqueId."

jb (Jul 24 2018 at 13:34):

there where also discussions on the connectathon about that. The only viable solution was to retrieve the document via an additional registry roundtrip.

jb (Jul 24 2018 at 13:35):

it works, but I don't think it's great.

Grahame Grieve (Jul 24 2018 at 13:35):

but the XDS Api doesn't front for any repository at all, right? only known repositories, and you don't add repositories on the fly?

John Moehrke (Jul 24 2018 at 13:36):

you only need to know the RepositoryID for the retrieve of the document. That is done with the DocumentReference.content.attachment.url. A datatype url does not have this limit, a URL is much bigger.

John Moehrke (Jul 24 2018 at 13:37):

the problem is the DocumentReference.id would need to include just the entryUUID in XDS; and the homeCommunityID+entryUUID in a XCA...

John Moehrke (Jul 24 2018 at 13:39):

I wonder if they are thinking Binary.id...

jb (Jul 24 2018 at 13:39):

it was exactly the problem that on the connectathon, they required to be the attachment be a reference to a binary

jb (Jul 24 2018 at 13:40):

currently searching a wiki page....

jb (Jul 24 2018 at 13:41):

i think the problem was addressed here:

jb (Jul 24 2018 at 13:41):

https://github.com/usnistgov/iheos-toolkit2/wiki/MHD-Testing-at-2018-North-American-Connectathon#light-weight-multiple-repository-support

John Moehrke (Jul 24 2018 at 13:41):

yes, in MHD on Create (Provide transaction) the DocumentReference.content.attachment.url is a Binary...

nicola (RIO/SS) (Jul 24 2018 at 13:42):

For documentreference id can be sha-1 from canonical content, like in git :)

John Moehrke (Jul 24 2018 at 13:48):

that github text does not help me understand the problem. A DocumentReference itself is not Repository specific, it is Registry specific. So there is no logic that the DocumentReference.id needs to hold the repository identifier. The only place the repository identifier is needed is in the DocumentReference.content.attachment.url... and that is not constrained to 64 characters.

jb (Jul 24 2018 at 13:53):

The text is from bill majurski. It's talking about the fact that they required the DocumentReference.content.attachment.url to point to some form of Binary.id. Wich was a mistake obviously, if I'm interpreting the resolution correctly: https://github.com/usnistgov/iheos-toolkit2/wiki/MHD-Testing-at-2018-North-American-Connectathon#iti-resolution-2

jb (Jul 24 2018 at 14:00):

in the mhd specification, it is very clear: 3.67.4.2.2.2.1 Document location: " IHE does not specify the format of the URL"

John Moehrke (Jul 24 2018 at 14:00):

so, Bill is an authoritative person in IHE, regarding XDS, but not FHIR. He has statements on that github that there was some kind of ITI decision. These are NOT true, there have been no discussion in ITI. This article is pointed to in a Change Proposal he submitted (CP-ITI-1113), this CP is limited to questions around the response returned to a Document Source on the Provide Document Bundle transaction. Specifically, this ties back to my statements above regarding the DocumentReference.content.attachment.url need only conform to a URL datatype. Where he recommends only:
MHD profile documentation needs review and possibly make this lack of format restriction more obvious to the reader
which I clearly would agree with... but this is NOT an issue with FHIR id limits

John Moehrke (Jul 24 2018 at 14:03):

so, back to the FHIR discussion.... generally it still seems that there might be times when it might be useful to be able to stuff more than one backend id into a FHIR API Resource id... Like we discussed above with homeCommunityID+entryUUID.
I am however hearing Grahame push back on that, with references to basis from ISO for the 64 character limit.

jb (Jul 24 2018 at 14:04):

ok. i understand that. so as far as I see it, the only options are: 1) hash the values and accept possible collisions (albeit the chance of this happening is very small, for patient id's, you absolutely don't want that to happen) or 2) deal with some sort of persistence.

jb (Jul 24 2018 at 14:04):

Grahame Grieve (Jul 24 2018 at 14:05):

yes some sort of persistence

Patrick Mangesius (Jul 24 2018 at 14:14):

Introducing a persistence is often not a very practical way - especially for small Services that are just mapping between the different datapools that could be build in a lightweight way. They are simply acting as proxies. Requiring them to introduce a persistence brings avoidable complexity and the chance of errors. You have to think about synchronization, how to handle clustered environments etc.

Michele Mottini (Jul 24 2018 at 16:13):

Can't you query the DocumentReference by identifier instead of using the id?

jb (Jul 24 2018 at 19:43):

@Michele Mottini Currently we actually have a bigger problem with patients. It would maybe be a workaround to do a query by identifier. However at least connectathon testing requires to do at least one read operation for every pdqm search.

Also it's actually pretty convenient for clients. Image doing a "Search" with the _summary or _elements parameter set and getting a very small response, and then only "Read" the resource fully you really need in a second step. Without having to parse a bundle again, exactly knowing which type of resource you will get and being absolutely sure you will get a single result (e.g. for typical master->detail flows).

John Moehrke (Jul 24 2018 at 21:04):

@jb I don't follow your use-case flow there. The search bundle with _summary will be a bag of Patient resources, that are just missing elements that might have values. So you know by this bundle the URL you can do GET on, and you know that they are Patient resources... so I don't understand what is missing.

jb (Jul 25 2018 at 07:09):

@John Moehrke "that are just missing elements...." Which is perfect if youwant to save bandwidth.... Using _elements even works better in this regard because the client can choose. E.g. showing a list of patient with only name and gender -> on click show patient details along with address etc. It was just an example. It's convenient to retrieve the full resource by id then, that's all. But that's probably not what this topic should be about, it was simply a response to @Michele Mottini why we don't just use the identifier param for another query. To make it simple for the client.

Reinhard Egelkraut (Aug 03 2018 at 13:38):

So, to make a little summary of this discussion:

For architectures in which FHIR is used just as API, not with a full FHIR server and which have a light weight design with no DB connection (communication servers or proxies in a DMZ; integration components with just a in-memory DB which are clustered or have multiple instances behind a load balancer), there are scenarios for which the vendors have to find a way to create resource.ids out of existing data, which is most of the time not even in their hands (old data structures of the legacy system, a connected IHE Affinity Domain, ...).

Some of the use cases, where the logical Id of a resource could therefore be potentially larger than 64 characters, are for example:

IHE - MHD with multiple repositories, referencing patients in DocumentReference/Manifest via patient Ids (and the use of OIDs to make them unique)
FHIR API for legacy systems for which a combination of identifiers/attributes will be used to create a unique logical Id

The discussed potential solutions for this problem in this chat are:
1. remove or extend the restriction on resource.id for 64 characters

would still be possible till R4 is released (after that it will get very hard I assume)
requires some existing systems to adapt adequately their DB structures where the logical id is stored (not all of them actually have this restriction in place on their storage) as well as the according indexes
some validations on existing FHIR servers might also need an adaption

2. basically force every system, which is providing FHIR APIs for these use cases, to create some kind of extra persistence or some other kind of workaround to map logical IDs of the resources to identifying attributes of the data, which could get interesting (though not impossible) for architectures

where the components are located in a DMZ, so customers don't want to have an additional DB because of the licence costs and access to the DB behind the DMZ is prohibited
where the components are located in a clustered environment or behind a load balancer, so there has to be a way to synchronise the resource.id mappings between the different instances

Both solutions would solve the above mentioned problematic use cases, but at least in my opinion the latter creates much more extra effort for implementers than the first one.

However, currently it seems during this discussion that the second option is the recommended one, correct?

John Moehrke (Aug 03 2018 at 13:58):

so, we have a limit today.. you are saying there are good use-cases where that limit presents problems... There is also good reason to have some kind of a limit. Moving to infinite length would not work in other ways. Doubling would just move the problem as eventually you will have this problem with two 64 character ids needing to be combined into a 128+... so I don't see an obvious step beyond the limit we have today... What is your solution?

Reinhard Egelkraut (Sep 17 2018 at 13:12):

Hi @John Moehrke ,

sorry for the late response but we had holiday season in Austria and it therefore took a while to gather the input after people came back.

I understand that there is no easy solution for these scenarios that will solve 100% of the problems, hence we looked for guidance in this chat.
Also, I've heard that there is another discussion ongoing, about adding other information to the URL for versioning purposes which could potentially lead to even shorten the allowed length of 64 characters for a logical id, is that correct?
So there are opposing requirements, which doesn't make things easier.

But nevertheless here are two suggestions which could solve our problems with the length restriction and we would be interested to get your opinions on it:

use string as data type for the logical id instead of id
- id is according to the FHIR data type model a sub set of string, so for FHIR servers with existing storage there wouldn't be much to change
- string itself has also a length limit (1024 characters in R4) so infinity is not an issue here
- FHIR servers could keep their own length restriction of 64 characters, since they are responsible of creating and maintaining the logical ids
- if just a FHIR API is in place, longer logical ids are possible now if it would be necessary
would it be possible to use/allow for the logical id a similar mechanism as for token?
- the logical id itself would be similar to [code] (which is also a sub set of string), it could even keep the length restriction of 64 characters
- but it would be possible to add information in a defined way like [code]|[system] where e.g. system could be an OID to an id
- exact searches are possible

Are these valid suggestions or are there other technical aspects which would speak against them, that we missed?

Lloyd McKenzie (Sep 17 2018 at 14:53):

The format of the id must meet the following rules: it must be unique for a particular resource type on a particular servers; and it must be a valid 'key' for all of the database technologies we know of. The current constraints on length and allowed characters are driven by the second. I'm not sure what sort of solution we can come up with that doesn't involve hashing to get within the 64 characters and maintaining some sort of conversion table between the HL7 id and the logical id

Grahame Grieve (Sep 19 2018 at 01:06):

@Reinhard Egelkraut there is a discussion about limiting the effective length of the id in one very specific context, but it is not relevant to your use case. String has a length limit of 1MB, not 1K. There's no reason to have id if we don't use it for Resource.id.

It's true that servers can control their ids - in some cases. But it's also true that truncating Ids is not possible. Fixing the length limit of FHIR ids guarantees that truncation won't be an issue in the eco-system. I do not think that the community would find it of value to change this. (and, in fact, process wise, it would be possible to change if it the current infrastructure ballot fails)

as for your last suggestion: you're actually describing how the identifier element works. I can't imagine how that would work on Resource.id.

Grahame Grieve (Sep 19 2018 at 01:13):

If I was implementing this, I'd use a lookup table, but the only thing that would have in it would be the persistent short identifiers for assigning authorities. e.g.

id 1.{uuid}

where '1' is a key in the table of assigning authorities. maintaning this short list should not be a challenge - at least, I don't see why it is. And I don't see, from anything in this discussion (I just read it all again) why that wouldn't work. I can see that's less convenient than to simply have astonishingly long ids, but not why it's something that's impossible to maintain (even in a cluster / facade farm)

Kostas Karkaletsis (Sep 20 2018 at 13:31):

I have to import some patient resources in a test environment and the patientid is not numeric. It is being inserted into database and I can retrieve it by doing a search like this "hapi-fhir-jpaserver4/search?serverId=home&pretty=true&resource=Patient&param.0.qualifier=&param.0.0=&param.0.1=PATIENTID&param.0.name=_id&param.0.type=token&sort_by=&sort_direction=&resource-search-limit=

When I create another resource and do a reference with this patient then searching this resource doesn't return anything when searching by Patient/PATIENTID in subject of the new resource.

Any idea?

Lloyd McKenzie (Sep 20 2018 at 14:55):

@James Agnew ?

James Agnew (Sep 20 2018 at 15:05):

I'm sorry, I don't really understand the flow you're describing. Do you mean you create a resource, but it's not showing up in a search for it shortly afterwards?

If that's it, you'd probably being affected by HAPI FHIR's query cache. You can disable the query cache for an individual request by using the Cache-Control header as shown here: https://smilecdr.com/docs/current/fhir_repository/performance_and_caching.html

You can even disable it entirely (or shorten its duration) using settings on the DaoConfig

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Resource.id restrictions · implementers

Stream: implementers

Topic: Resource.id restrictions

Reinhard Egelkraut (Jul 24 2018 at 12:45):

Grahame Grieve (Jul 24 2018 at 12:49):

jb (Jul 24 2018 at 13:07):

Grahame Grieve (Jul 24 2018 at 13:13):

Grahame Grieve (Jul 24 2018 at 13:13):

John Moehrke (Jul 24 2018 at 13:18):

Grahame Grieve (Jul 24 2018 at 13:20):

jb (Jul 24 2018 at 13:20):

jb (Jul 24 2018 at 13:21):

Grahame Grieve (Jul 24 2018 at 13:22):

Grahame Grieve (Jul 24 2018 at 13:22):

John Moehrke (Jul 24 2018 at 13:24):

Grahame Grieve (Jul 24 2018 at 13:27):

Grahame Grieve (Jul 24 2018 at 13:27):

jb (Jul 24 2018 at 13:27):

Grahame Grieve (Jul 24 2018 at 13:28):

John Moehrke (Jul 24 2018 at 13:29):

Grahame Grieve (Jul 24 2018 at 13:29):

John Moehrke (Jul 24 2018 at 13:30):

John Moehrke (Jul 24 2018 at 13:31):

jb (Jul 24 2018 at 13:33):

jb (Jul 24 2018 at 13:33):

jb (Jul 24 2018 at 13:33):

jb (Jul 24 2018 at 13:33):

jb (Jul 24 2018 at 13:34):

jb (Jul 24 2018 at 13:35):

Grahame Grieve (Jul 24 2018 at 13:35):

John Moehrke (Jul 24 2018 at 13:36):

John Moehrke (Jul 24 2018 at 13:37):

John Moehrke (Jul 24 2018 at 13:39):

jb (Jul 24 2018 at 13:39):

jb (Jul 24 2018 at 13:40):

jb (Jul 24 2018 at 13:41):

jb (Jul 24 2018 at 13:41):

John Moehrke (Jul 24 2018 at 13:41):

nicola (RIO/SS) (Jul 24 2018 at 13:42):

John Moehrke (Jul 24 2018 at 13:48):

jb (Jul 24 2018 at 13:53):

jb (Jul 24 2018 at 14:00):

John Moehrke (Jul 24 2018 at 14:00):

John Moehrke (Jul 24 2018 at 14:03):

jb (Jul 24 2018 at 14:04):

jb (Jul 24 2018 at 14:04):

Grahame Grieve (Jul 24 2018 at 14:05):

Patrick Mangesius (Jul 24 2018 at 14:14):

Michele Mottini (Jul 24 2018 at 16:13):

jb (Jul 24 2018 at 19:43):

John Moehrke (Jul 24 2018 at 21:04):

jb (Jul 25 2018 at 07:09):

Reinhard Egelkraut (Aug 03 2018 at 13:38):

John Moehrke (Aug 03 2018 at 13:58):

Reinhard Egelkraut (Sep 17 2018 at 13:12):

Lloyd McKenzie (Sep 17 2018 at 14:53):

Grahame Grieve (Sep 19 2018 at 01:06):

Grahame Grieve (Sep 19 2018 at 01:13):

Kostas Karkaletsis (Sep 20 2018 at 13:31):

Lloyd McKenzie (Sep 20 2018 at 14:55):

James Agnew (Sep 20 2018 at 15:05):