Stream: implementers
Topic: Provenance resource for Middleware
Simone Heckmann (May 10 2016 at 22:47):
While trying to compose a Provenance resource to annotate stuff that is extracted from a V2 message and posted on a server, I just realized, that Provenance doesn't fit this use case very well. I hardly find any useful values to fill in, if the agent is a middleware engine.
Simone Heckmann (May 10 2016 at 22:48):
Would the V2 message I pulled the stuff from be considered an "entity" in this context?
John Moehrke (May 10 2016 at 22:48):
Simone, please explain why you see it as not useful? Rene did find it useful to explain where the v2 object came from, and to point at a copy of the original in Provenance.entity
John Moehrke (May 10 2016 at 22:49):
Yes, Povenance.entity would be the information used in creating the 1..* Provenance.target resources
Simone Heckmann (May 10 2016 at 22:51):
I'm not saying Provenance isn't useful, but I am having trouble fitting things like "name of the middleware system", "id of the original message" into it or picking proper values for agent.role etc.
John Moehrke (May 10 2016 at 22:51):
I do have an example that I have not yet integrated into the build that Rene did...
Simone Heckmann (May 10 2016 at 22:51):
Oh yes please! That would be helpful!
John Moehrke (May 10 2016 at 22:52):
Rene worked on it... this might be his blog on the topic http://www.ringholm.com/docs/04350_mapping_HL7v2_FHIR.htm
John Moehrke (May 10 2016 at 22:53):
of course things change from blog articles, so check with him...
Simone Heckmann (May 10 2016 at 22:58):
Hah. That's fun. The place where Rene captured the fact, that the Agent is in fact a software (agent.type) is missing in the resource definition ...
John Moehrke (May 10 2016 at 23:03):
likely now Provenance.agent.role
Simone Heckmann (May 10 2016 at 23:03):
reason.text to capture the fact that this is the result of a message transformation doesn't work either because reason is a Coding not a CodableConcept
John Moehrke (May 10 2016 at 23:04):
I would be glad to put an example into the specification... Just need a few eyes to look one example over before we put it in. I would suggest you work with Rene and suggest one example.
John Moehrke (May 10 2016 at 23:05):
love to get these suggestions as CPs... need real-world input.
Simone Heckmann (May 10 2016 at 23:34):
ok, let's set up the scenario: We have some sort of middleware device that receives some sort of non-FHIR input (e.g. a V2 message, a CDA document, a CSV file...) and extracts FHIR resources from this input and posts them on a server. The resulting transaction is supposed to include a provenance resource that captures
a) some information about the middleware (type, vendor, version...)
b) information about the data source (type/format, identifier (original filename or messageID ) to trace it back)
c) if known, information about the software that created the original input
d) if known, information about the user that created the information in the first place (e.g. from the EVN-Segment)
Simone Heckmann (May 10 2016 at 23:35):
Would a) c) and d) be all agents, or would any of them be a relatedAgent?
Simone Heckmann (May 10 2016 at 23:37):
The roles (http://hl7-fhir.github.io/valueset-provenance-agent-role.html) could be
a) assembler
c) composer
d) author
...does that make sense?
Simone Heckmann (May 11 2016 at 19:45):
I don't know how to fit b) into the entity part of provenance. Though the entity.role "derived" seems like a good fit, it seems that the entity itself is expected to be a FHIR resource.
Simone Heckmann (May 11 2016 at 20:17):
I created http://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=9996 to keep track of this.
Simone Heckmann (May 20 2016 at 14:55):
Oh, I just realized we talked about Rene, but didn't tag him. So: @René Spronk : Care to chime in? :)
Glen Marshall (Jul 19 2016 at 16:19):
[#9996] Summary: Using Provenance resource to annotate content derived from non-FHIR sources
Glen Marshall (Jul 19 2016 at 16:19):
Original comment:
While trying to compose a Provenance resource to annotate stuff that is extracted from a V2 message and posted on a server, I just realized, that Provenance doesn't fit this use case very well. I hardly find any useful values to fill in, if the agent is a middleware engine.
Here's the Scenario:
We have some sort of middleware device that receives some sort of non-FHIR input (e.g. a V2 message, a CDA document, a CSV file...) and extracts FHIR resources from this input and posts them on a server. The resulting transaction is supposed to include a provenance resource that captures
a) some information about the middleware (type, vendor, version...)
b) information about the data source (type/format, identifier (original filename or messageID ) to trace it back)
c) if known, information about the software that created the original input
d) if known, information about the user that created the information in the first place (e.g. from the EVN-Segment)
Would a) c) and d) be all agents, or would any of them be a relatedAgent?
The roles (http://hl7-fhir.github.io/valueset-provenance-agent-role.html) could be
a) assembler
c) composer
d) author
...does that make sense?
I don't know how to fit b) into the entity part of provenance. Though the entity.role "derived" seems like a good fit, it seems that the entity itself is expected to be a FHIR resource and nothing else...
Help, please!
Proposed response:
Accepted with modification: include the following non-normative guidance:
1. The middleware, original input, and use are all agents. They act independently of each other, so are not relatedAgent. Each is associated with a Provenance instance.
2. Middleware is an assembler for the purposes of Provenance. Other agents are assigned roles per the role vocabulary definitions.
3. Metadata, if known, should be associated with each of the agents. Any missing values should be null or default, depending on the defined cardinality in Provenance.
Glen Marshall (Jul 19 2016 at 16:20):
Also, the Provenance.entity might point at a copy of the raw data received (e.g. the HL7 v2 message). This might have been saved as a Binary resource for this purpose.
Oliver (Jul 19 2016 at 16:58):
Hi Simone. I think this is a good use case because it involves which actor roles are used based on what is being imported and how. I'll give you my opinion piecemeal so it's easier to respond to.
1) The provenance information that is extracted from the original file/data should be extracted and identified in the new resource that is created. The system should be able to pull that out if it's a readable file.
2) I don't think information about the software that created the original data/file matters or will be available in a majority of cases unless they have referenced the device used to author the documents. What I'm getting at is the original software probably doesn't matter.
3) The middleware is IDed as an agent. I think it's a related agent to the local system, which is also IDed as as agent, but I think the value sets are lacking to describe this.
4) The most important point in this whole thing for me is the agent role. In this case, I'm talking about the local system acting as an agent and taking the message or CDA document and turning it into a resource. If it is taking a message or PDF or flat file or anything that is reproduced on a 1:1 basis, I think the role should always be author. It is authoring a new resource from existing data but isn't making any changes. If I as a downstream system see this, I will know I don't have to dig any deeper or validate anything because "author" is used in these 1:1 cases. If the local system is disassembling a larger document but keeping all the data intact (1:1) during the resource creation process, then "author" should still apply, even though the end result is distributed among more than one resource. Even though an algorithm is being used to perform the disassembly and creation, you can still trace everything back to the original file and nothing has been modified. That is important because it relates to downstream validation and use. If, however, you are using logic or algorithms to alter the source data or reduce or modify anything in the full set, then assembler or composer apply. That role ID should be reserved to ID changes that have been applied in my opinion. It's assembler if the process is automated, without human intervention and decision-making. It's composer if someone is actually intervening and adding their logic or selections to the software that is used to perform the action. In that case, the human is also acting as an agent and needs to be referenced.
The larger point here is that me as a downstream system can use that scheme to figure out if I need to validate the results further or check the original to see if I can trust it as a complete data set. The agent role matters to me for that reason. If I see author I can just move on, if I see either of the other two, there are additional steps that I may have to implement to ensure data fidelity. I think we should probably talk about this and define it somewhere in the spec because the implications for system operations are large.
John Moehrke (Jul 19 2016 at 20:12):
Note this discussion is in support of us coming to a resolution on GF#9996
John Moehrke (Jul 19 2016 at 20:13):
@René Spronk we really need your input, suggestion, improvements.
René Spronk (Jul 22 2016 at 08:47):
I think Simone captures the requirements pretty well.
René Spronk (Jul 22 2016 at 08:53):
An analogy: if you were to receive a translation (in English) of a Chinese clinical document, you'd like to know who the translation-agent was (a human, presumably one could look up their qualifications [the qualifications of the translator arent part of the provenance resource itself], or Google Translate). Yes, it would also be nice to know the original author of the document, but that may be a different provenance resource than the provenance resource which focuses on the "translation" activity. Maybe the v2-FHIR middelware solution should create two provenance resources ? (not sure) . @Simone Heckmann deals with the actual use case, so she's best positioned to make a recomendation..
John Moehrke (Aug 09 2016 at 21:54):
The Provenance that the middleware can apply is only at the perspective from the middleware.
1. It receives from sender X, payload Y.
2. It optionally saves payload Y as a FHIR Binary Resource (Y')
3. It disassembles Y into some number of Z FHIR Resources (Z(n)) which it creates
4. It creates a Provenance record P, with
P.target all of Z(n);
P.agent[0] - self as assembler;
P.agent[1] is X as a ???;
P.entity is Y' as source
What we learn is that for the P.agent[1], we don't have a way to say 'source' for the role of that agent. So we have a need of new vocabulary.
Simone Heckmann (Aug 11 2016 at 20:07):
I think the provenance resource will most likely be used for debugging purposes.
Whatever the middleware algorithm is, that authors/assemples the Resources: It can potentially be flawed.
And whenever such a flaw is detected, there needs to be a way to backtrace, which resources have been affected.
It may also be that the original author (the sending software system) was affected e.g. by a system update and exported flawed data. In that case it would be helpful if I could trace back the Resources not only to the middleware but also to the original authoring system. However, I don't think there is a requirement to trace back to the original data enterer (the user of the authoring system). Most of the time this information can be retrieved from the source data (e.g. V2 EVN-Segment) and I don't see a use case that would require this to be searchable.
I can imagine that having the original data as binary attachment will be helpful for debugging purposes. E.g. if a Resource looks suspicios, you can take a look at the original data to figure out whether the middleware or the original sending system is to blame.
Beyond that I don't have any other requirements. However, that's from my Middleware point of view. Server implementers may have different use cases altogether for the provenance Resource.
In the end: since the provenance Resource creates an enormous overhead in the communication and will most likely be implemented only reluctantly, I'd vote for: the simpler, the better!
John Moehrke (Aug 12 2016 at 15:02):
@Simone Heckmann Given your post, I think my proposal aligns with it? Where as Glen's proposal added additional elements that you discounted. I agree the intermediary should only be responsible for the things that the intermediary can see from its perspective. If original author is necessary for a use-case then original content should have included author provenance, which would have been processed by the intermediary just like any content. This encapsulated provenance is source provenance not intermediary provenance. correct?
John Moehrke (Nov 08 2016 at 17:19):
Security WG will discuss proposed change for GF#9996 today. Mostly explanation keeping with discussion here in chat and workgroup. Adding an explanation to the Provenance page explaining how middleware that transforms information on import might record this fact in a Provenance record.
Simone Heckmann (Nov 08 2016 at 17:25):
Sorry, I missed you tagging me with your last question. I don't understand "This encapsulated provenance is source provenance not intermediary provenance?" What do you mean by "encapsulated provenance"?
John Moehrke (Nov 08 2016 at 17:31):
Hi Simone. The source data might include 'provenance' (likely in source data format). That kind of provenance is very helpful to explain the history of the source data prior to the middleware doing anything. That provenance is to the middleware, just like all the other source data... that is it is source data that the middleware imports, transforms, and hopefully properly created FHIR Resouces. What I was pointing out is that that provenance is not the topic of the CR, the topic of the CR is what Provenance record should the middleware create to explain what the middleware did. That said, my proposal in GF#9996 does have a paragraph explaining this. Hopefully we can get the words improved.
Simone Heckmann (Nov 08 2016 at 17:33):
In this case, the answer is: yes :)
John Moehrke (Nov 08 2016 at 17:35):
do you like the proposal in GF#9996?
Simone Heckmann (Nov 08 2016 at 17:44):
Using Binary attachment to store the original V2 message -> agreed.
So we'd have something like Provenance.source.data of type reference with choice of (Binary|Bundle|ImagingStudy and whatnot)?
I think it would be useful to have at least have also a Provenance.source.identifier. If I know that a certain V2 message was sent in error, I'd want to be able to search by messageControlId and figure out which Resources have been created from this message without having to parse all of the binary attachments.
John Moehrke (Nov 08 2016 at 17:59):
Interesting... Wonder if this is more of a Binary responsibility...
John Moehrke (Nov 08 2016 at 18:00):
or does that call for treating the original v2 message as Document, and thus using DocumentReference to hold this identifier metadata?
Simone Heckmann (Nov 08 2016 at 18:04):
The _has-parameter doesn't support chaining, so having it in the Binary resource won't help.
Simone Heckmann (Nov 08 2016 at 18:05):
It has to be an attribute of Provenance
Simone Heckmann (Nov 08 2016 at 18:09):
"Give me all Observations created from the V2 message with controlId 123456" would have to be something like /Observation?_has:Provenance:target:source-identifier=123456
...assuming Provenance.source.identifier had a search attribute named "source-identifier"
There's no way to use chaining like Provenance.source:Binary.identifier or something like that with the _has-Parameter
John Moehrke (Nov 08 2016 at 18:12):
so, we should add an element to Provenance.entity.identifier to be used only when Provenance.entity.reference is Binary? Otherwise we create an element that can be in conflict with a FHIR native Resource that is pointed to by Provenance.entity.reference.
Simone Heckmann (Nov 08 2016 at 18:17):
Ah, no! I think the solution is much easier:
entity.reference needs to be changed from Datatype "uri" to Datatype "Reference"
John Moehrke (Nov 08 2016 at 18:18):
but it is URI because it might not always be a FHIR native object. Even in your case the content might be stored somewhere else, not as a FHIR Binary; but as just a HTTP object.
Simone Heckmann (Nov 08 2016 at 18:18):
Then it can be either a literal reference to a Binary/Document.... and/or a logical reference hlding only an identifier: http://build.fhir.org/references.html#logical
Simone Heckmann (Nov 08 2016 at 18:19):
Reference.reference may point at non-FHIR stuff: http://build.fhir.org/references.html#literal
John Moehrke (Nov 08 2016 at 18:19):
seems we have a new discussion... do we do as you recommend, and thus can't point at source that are not FHIR objects.... or we add an identifier.
John Moehrke (Nov 08 2016 at 18:20):
hmmm,that must have happened after this was modeled in Provenance this way.
John Moehrke (Nov 08 2016 at 18:20):
so, we can change agent.reference to a Reference, and remove agent.type as it is unnecessary?
Simone Heckmann (Nov 08 2016 at 18:20):
Logical References were introduced during Baltimore WGM
John Moehrke (Nov 08 2016 at 18:22):
guess I can get rid of entity.display too.
Simone Heckmann (Nov 08 2016 at 18:24):
you mean "entity.type"? I guess so. I have never seen any other Resource that has a reference to a choice of targets and adds a "type" attribute to specify which it is. I guess FHIR implementers are used to deal with this kind of ambiguity
John Moehrke (Nov 08 2016 at 18:25):
type was there to support uri that was not a FHIR object. So, now that these are in the definition of the Reference datatype they are unnecessary
John Moehrke (Nov 08 2016 at 18:27):
I updated GF#9996 with this new information as part of the proposed resolution.
Simone Heckmann (Nov 08 2016 at 18:38):
Actually, I'm not really sure how searching for a logical reference works.
...wait. I'll ask that in another thread...
Simone Heckmann (Nov 21 2016 at 20:17):
for the resolution of GF#9996
Last updated: Apr 12 2022 at 19:14 UTC