FHIR Chat · Identify duplicate resources in a Bundle

Stream: implementers

Topic: Identify duplicate resources in a Bundle

Vivek TK (Jun 16 2021 at 06:27):

Hi,
I have a use case to detect the duplicate resources in a bundle-say Practitioner resource which is same, but may come twice or n times in the bundle due to the type of the data source used(e.g-conversion from CDA to FHIR). I would like to know if there is a standard practice to identify the duplicate resources. Can we rely on the identifiers for each resource?Will that be the right way to do? Or are there any other approach where we can determine duplicate resources within a Bundle?

Gino Canessa (Jun 16 2021 at 19:26):

If they have the same id, it is not a valid bundle (unless it is a history bundle with multiple versions of the same resource, but from your description that is not the case). Details are on the Bundle resource page (the formal constraint is bdl-7). So, the best solution is to fix the generation so that it doesn't include them. If that is not an option, you should be able to use the entry.fullUrl element.

If the entries are the same data but different identifiers, then the process for de-duplication will vary depending on what that data is (e.g., for patients are you matching names exact + date of birth, partial names, etc.).

Vivek TK (Jun 18 2021 at 00:50):

Thanks for the suggestion @Gino Canessa . Will check with the full url option. Also I feel the process of de-duplication will become very complex as there are multiple resources involved and also they use custom profiles.

Gino Canessa (Jun 18 2021 at 15:32):

No problem. But yes, if the resources are coming from different sources they are quite unlikely to have matching URLs or IDs. You will need to determine what constitutes a duplicate for each resource and remove them. Note that this also means checking everything else in the bundle for links to that resource (e.g., if you remove a Patient record, any Encounter that points to that patient needs to be changed to point at the new one).

De-duplication like that can be challenging - good luck!

John Moehrke (Jun 18 2021 at 15:40):

just for clarity... the id is only unique within a resource at a root url. so the id value of "1" might show up for both Patient/1, as wells Obsrvation/1, and Practitoner/1. These are not the same, even though the id is all "1".

Vivek TK (Jun 19 2021 at 14:16):

Thanks @Gino Canessa @John Moehrke . Also would like to know if you have any thoughts on the updates of these resources when they are stored in a FHIR server. During the first transformation, consider these resources converted from CDA documents are stored to a FHIR server. Now the challenge is to update these FHIR resources in the FHIR server when the CDA document gets updated so that the data in XDS and its transformed copy stored in FHIR server are intact.

For a few resources like Patient, Practitioner etc, we may rely upon the identifier(not the logical id), but I am not sure of the case when the identifiers are not available for other resources that are extracted from CDA sections. Basically, the resources created from CDA sections like Observation, Medication,MedicationStatement etc which does not have any identifiers available in the CDA document(which is the source of transformation). .. As @Gino Canessa mentioned before, this makes us depend on the data within each resource to identify the duplicates and they can vary from resource to resource.

Vassil Peytchev (Jun 19 2021 at 22:13):

What is the purpose of transforming the contents of CDA documents into FHIR resources in the first place? Is it to make that data available to be used by FHIR clients?

Vivek TK (Jun 20 2021 at 04:59):

Right @Vassil Peytchev

Vassil Peytchev (Jun 20 2021 at 17:42):

In that case, I think you need an intermediate step: CDA - > normalized data - > mapping the normalized data to FHIR resources. All the things that you are discovering about needing to do still need to be done, but if you separate the CDA processing from providing the data as FHIR resources, I think it will be more manageable.

John Moehrke (Jun 21 2021 at 11:53):

Note that what you are describing is what the IHE mobile Cross-Element Data Extraction implementation guide recommends - https://wiki.ihe.net/index.php/Mobile_Cross-Enterprise_Document_Data_Element_Extraction.
This recommends use of FHIR Provenance to enable linking back to the CDA document(s), and thus you have Provenance details to enable updates as you mention.
As to the problem, this is simply a problem for systems-design to overcome. There is no interoperability specification that is going to help you. It comes down to the tough task of de-duplication on the edge of false-negatives and false-positives.

Vivek TK (Jun 22 2021 at 13:05):

Thanks @John Moehrke . Provenance indeed keeps track of the resources along with their logical ids. I guess that's a good point to start.

John Moehrke (Jun 24 2021 at 14:18):

good tutorial @Oliver Egger gave on mapping CDA to FHIR at DevDays is available https://www.youtube.com/watch?v=d941K8CMWq4

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Identify duplicate resources in a Bundle · implementers