FHIR Chat · Idea for $import referential integrity · bulk data

Stream: bulk data

Topic: Idea for $import referential integrity


view this post on Zulip Paul Church (Dec 03 2019 at 22:22):

One of the hardest parts of bringing in a large data set is how to fix up the source system's references in a robust way that doesn't require preprocessing the entire input and doesn't leave dangling references.

Perhaps we can leverage the proposed meta import-source extension, which is an Identifier. What if all literal references in the imported data that refer to resources on the source system are rewritten during $import to be logical references (reference.identifier) with the system and value corresponding to the import-source that the referenced resource will have once imported?

After the first pass of importing and rewriting all of the resources, a second pass can go through and rewrite these logical references back into literal references if possible, and report errors where it wasn't possible. Other clients can rely on literal references having integrity, and logical references maybe referring to a resource or maybe not (which is par for the course on any logical reference).

view this post on Zulip Josh Mandel (Dec 04 2019 at 01:01):

This is interesting, @Paul Church. If I'm reading it right, the end-result would be ... references in the new server that were sometimes logical and sometimes literal? For the ones (stuck as) logical, identifier-based, won't that break queries on the new server (like, chaining and _include and compartment syntax and so on)?

view this post on Zulip Paul Church (Dec 04 2019 at 02:15):

Well, if 1) all references in the source system were literal, 2) every reference was valid on the source system, 3) every referenced resource was in the exported data, 4) every referenced resource was successfully imported, and 5) enough time passes for the second pass to finish, then there will be guaranteed no logical references remaining. These are reasonable assumptions in many use cases. The hardest one is fixing up #4 when a small percentage of resources in a large import job are rejected.

It's true that stuck logical references will cause all of those gaps in functionality. But the primary alternatives are to ignore ref integrity entirely (as GCP currently does during imports), or discard the transitive closure of every imported resource that would violate ref integrity, or...?

Conformant clients might be presented with logical references at any time unless a profile disallows it, so hopefully they will be robust.

view this post on Zulip Josh Mandel (Dec 04 2019 at 02:32):

Okay -- that makes sense and I'm not disagreeing :) Just trying to figure out what the implications look like.

view this post on Zulip Michele Mottini (Dec 04 2019 at 08:25):

Conformant clients might be presented with logical references

This is very optimistic. Servers do not necessarily support searches by identifier - making logical references unresolvable

view this post on Zulip Paul Church (Dec 04 2019 at 14:48):

I wouldn't expect the client/server to react to logical references by trying to resolve them - it's often impossible. The client just needs to accept that there are references it can't follow.

view this post on Zulip Paul Church (Dec 04 2019 at 14:50):

As a further enhancement, when doing a subsequent $import from the same source that may update some existing resources and create others, references to resources that already existed from the previous $import can be resolved immediately by the same mechanism the server uses to find which resources should be updated.

view this post on Zulip Michele Mottini (Dec 04 2019 at 15:05):

The client just needs to accept that there are references it can't follow.

That's not a thing for all clients that use SQL db with foreign keys linking the different concepts - either you can resolve the reference or you have to ignore the data

view this post on Zulip Paul Church (Dec 04 2019 at 15:20):

Those clients have to fall back to ignoring the transitive closure of things that didn't make it through the $import. That's the baseline option in any scenario though - if you replicated data from another server and some references are dangling, the data surely won't satisfy everyone.


Last updated: Apr 12 2022 at 19:14 UTC