FHIR Chat · Data Quality

Stream: implementers

Topic: Data Quality

OS (Oct 19 2020 at 19:46):

Not sure if this is the right place - but no other channel/topic seems to fit.

I'm curious if anyone has thoughts on how to approach data quality in a FHIR context. We are engaged in architecting a solution where we have a FHIR server, alongside a Data Lake and MDM solution. In this context, FHIR is used as the ingest layer as well as consumption layer so we have varying levels of data 'quality'. For example, some entries may not be fully complete, some may have corrupt data (incorrect references), and some resources may fail some business rules. We are coming up with a grading scheme on how to bucket the data into varying levels of quality, but I'm wondering if anyone's thought about how to flag that back to the user. My interim solution is to create an identifier which provides a quality 'level' or 'score' for each resource. Would love to get others' thoughts on this.

Thanks!

Jose Costa Teixeira (Oct 19 2020 at 19:56):

data quality is a multidimensional thing. do you consider that?

Jose Costa Teixeira (Oct 19 2020 at 20:08):

I'd go at it like this:

define and keep an inventory of your quality metrics (because they will change). There's a resource Measure for that.
If you want to keep track of the quality of each resource, you can make an extension for that resource. I would not use an identifier, I would use a reference to a MeasureReport, basically saying: "for this Patient resource, quality is [0.9 , 0.5 , 0.9, 1, 0.4, 0.25] = 0.75 overall"

OS (Oct 19 2020 at 20:35):

Jose Costa Teixeira said:

I'd go at it like this:

define and keep an inventory of your quality metrics (because they will change). There's a resource Measure for that.

If you want to keep track of the quality of each resource, you can make an extension for that resource. I would not use an identifier, I would use a reference to a MeasureReport, basically saying: "for this Patient resource, quality is [0.9 , 0.5 , 0.9, 1, 0.4, 0.25] = 0.75 overall"

That's a fantastic approach! And it allows you to model multi-dimensional measures as well!
The reason I was going for a single dimension was so that for a consumer, we could indicate very simply the state/health of a resource (kind of like red/yellow/green), but I think your approach enables that in a much more flexible way!

Lloyd McKenzie (Oct 19 2020 at 20:59):

If you're looking to flag low-quality resources for 'action', you could look at Resource.meta.tag.

Lloyd McKenzie (Oct 19 2020 at 21:00):

One other benefit of using tags is that you can generally change them without impacting signatures (if that's a consideration)

Jose Costa Teixeira (Oct 19 2020 at 21:12):

Extensions can be in Resource.meta, right?

Lloyd McKenzie (Oct 19 2020 at 23:36):

Yes, but those aren't guaranteed to be outside signature I don't think?

John Moehrke (Oct 20 2020 at 01:43):

There is a valueSet including many codes related to data quality (integrity) for use in .meta.security. http://hl7.org/fhir/v3/SecurityIntegrityObservationValue/vs.html

OS (Oct 22 2020 at 17:02):

John Moehrke said:

There is a valueSet including many codes related to data quality (integrity) for use in .meta.security. http://hl7.org/fhir/v3/SecurityIntegrityObservationValue/vs.html

Hmm the flags listed on that page would be super useful, but they are all in a security context. I'm thinking about data quality independent of security - am I thinking about that incorrectly?

Lloyd McKenzie (Oct 22 2020 at 18:11):

You wouldn't use meta.security, you'd use meta.tag or meta.extension

John Moehrke (Oct 22 2020 at 23:34):

just because they are on meta.security does not make them special and only useful for security.

Jose Costa Teixeira (Oct 23 2020 at 06:59):

Lloyd McKenzie said:

Yes, but those aren't guaranteed to be outside signature I don't think?

I'd prefer to have guidance on that. If we add extensions to resource.meta, do they have an impact on signature?

Lloyd McKenzie (Oct 23 2020 at 14:25):

The 'static' canonicalization excludes the whole 'meta' element, so extensions on meta would be excluded

Anusha (Mar 11 2021 at 14:56):

@Lloyd McKenzie Our upstream ETL pipelines allow us to "blacklist" bad ingestions/batches (e.g., in a scenario where we got a bad batch from a claims or lab vendor, or ran into a transformation error). Trying to figure out what the conventions/guidelines are for how this should translate to the FHIR server. If we've written a batch of data to the FHIR server and then need to revert it, is the standard practice to delete the impacted records/versions? Vs. using something like entered-in-error?

Lloyd McKenzie (Mar 11 2021 at 15:45):

The choice of deleting vs. flagging as "entered in error" is generally driven by whether there's an likelihood of the data having been disclosed and used by someone. If it has, entered-in-error is going to be preferred.

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Data Quality · implementers