Stream: bulk data
Topic: Provenance
Michelle Vondercrone (Apr 20 2020 at 19:13):
In the patient- or group- bulk data request, looking for clarity on how to interpret the _type parameter when it includes the Provenance resource.
Take an example where on the patient- or group- level $export, the _type = Patient, AllergyIntolernace, Provenance - what is intended? Is Provenance to be returned for each resource included in the list – in this case Patient and AllergyIntolerance?
Take another example where on the patient or group $export, the _type= Provenance. Is the intention that this would be an invalid request?
@Dan Gottlieb @Josh Mandel
Michelle Vondercrone (Apr 23 2020 at 12:44):
Insights anyone?
Michele Mottini (Apr 23 2020 at 12:51):
Our server export Provenance for all resources. If you specify only Provenance I think it export just the provenances for the patients - but not sure
John Moehrke (Apr 23 2020 at 13:04):
Provenance can be attached to anything (Provenance.target), it is a reverse Include relationship. I suspect in bulk data you want a simple way to indicate that you do NOT want any Provenance, you want comprehensive Provenance of everything, you want only the last Provenance on everything, you want Provenance only on resources of a specific type (e..g Observation)... right? Is this the intention of the current bulk data provenance function? Is this beyond intended? Is this a blindspot that should be clarified?
Dan Gottlieb (Apr 23 2020 at 14:25):
@Michelle Vondercrone The IG describes the _type parameter as restricting the resources returned, so if a server returns a Provenance resource for each included resource as part of the full patient or group bulk export response I'd expect setting _type= Provenance to limit the response to just those
Jenni Syed (Apr 23 2020 at 15:13):
It does, but I agree that it's unclear if that means "all provenance records for all the data for this patient" or "all provenance records for data we're also returning" or "only last provenance record for each item we're returning"
Jenni Syed (Apr 23 2020 at 15:14):
The first seems not useful in the context of bulk data and provenance, but that's exactly what _type means for all other resources you would list, correct?
Dan Gottlieb (Apr 23 2020 at 16:05):
Yup, agreed. If a client requests _type=AllergyIntolernace,Provenance, I'd expect the server to return Provenance resources associated with resources in the patient compartment including but not limited to AllergyIntollerance, which is probably more data than the client generally wants.
Dan Gottlieb (Apr 23 2020 at 16:05):
I don't think there's anything in the IG that precludes a server from defining a specific behavior around this though - eg. I only return the most recent Provenance resource for a given resource and/or I only return Provenance associated with resources in the output data set.
Dan Gottlieb (Apr 23 2020 at 16:07):
It seems like the open question is whether there's a practical need for a negotiation around this at run time, where a server supports multiple of these approaches (eg. I return most recent Provenance resource or I return all Provenance resources) and the client needs the ability to choose. Of course, servers are free to define a custom request parameter, but if there are many different servers that are going to support runtime options around which Provenance resources are returned, we'll want to define something standard.
Jenni Syed (Apr 23 2020 at 16:08):
Since this is part of USCDI/bulk data regulation from ONC, I think we may want to land on a standard approach that meets the needs of the consumer
Dan Gottlieb (Apr 23 2020 at 16:13):
That's fair - do you have a sense of what kind of Provenance data API users are asking for (most recent, all history, only for specific resources, none, etc)?
Jenni Syed (Apr 23 2020 at 16:15):
I think for the single patient APIs it's just the "last jump" (as a floor requirement)
Jenni Syed (Apr 23 2020 at 16:16):
so most recent
Dan Gottlieb (Apr 23 2020 at 16:25):
So in the absence of specific use cases requiring something else, a reasonable floor might be that servers providing the USCDI dataset return the most recent Provenance resource for each of the other resources being returned in the bulk response dataset (subject to the _since filter, and unless _type is populated and doesn't include Provenance)?
Jenni Syed (Apr 23 2020 at 16:29):
I'm trying to pull up the USCore reqs...
http://hl7.org/fhir/us/core/StructureDefinition-us-core-provenance.html
Jenni Syed (Apr 23 2020 at 16:29):
So, USCore requires the revInclude per resource to be able to pull related provenances
Jenni Syed (Apr 23 2020 at 16:30):
Which is experimental still for bulk :)
Jenni Syed (Apr 23 2020 at 16:31):
Would be interested in use cases that exist that would want something other than last provenance, provenance related to other items in bulk...
Dan Gottlieb (Apr 23 2020 at 16:37):
Yup, me too! If we don't need a param to negotiate different approaches per request, this feels a bit like an expansion of the predefined bulk "data set" definitions we started on in v1 of the IG: https://hl7.org/fhir/uv/bulkdata/index.html#us-core-data-for-interoperability
John Moehrke (Apr 24 2020 at 14:13):
that is why I pointed out that Provenance is typically retrieved using _revinclude... so, to just put Provenance in the bulk data access can't mean _revinclude as a special case, what if you did want a bulk data transfer of all provenance associated with some subject? We don't know that use-case today, but it could happen (legal medical records audit for safety and records retention compliance).
John Moehrke (Apr 24 2020 at 14:16):
A _revinclude gives you comprehensive provenance (as comprehensive as the system has, which might be shallow or might be very deep). So I don't think this is what you want either. The emerging "Basic Provenance IG" definition might be a good subset to request, this is provenance of Author, and if the data was authored elsewhere the organization from which it came. The IG allows other provenance, but sets this as the minimal.
Josh Mandel (Apr 24 2020 at 14:46):
I think as a goal in this year's Argonaut project we should include adding guidance for these kinds of questions. I am probably stating the obvious :-)
Dan Gottlieb (Apr 27 2020 at 14:33):
I created https://github.com/HL7/bulk-data/issues/67 to capture this.
Last updated: Apr 12 2022 at 19:14 UTC