FHIR Chat · Provenance overload on transforms

Stream: Security and Privacy

Topic: Provenance overload on transforms

John Moehrke (Dec 09 2019 at 21:14):

On the Basic Provenance call we stumbled over what "Basic Provenance" should include when data are transformed. This is a clear need when one is transforming something radically, like CDA -> FHIR.

This transform use-case is already within FHIR Core http://build.fhir.org/provenance.html#import

BUT, once we start down this path we need to address the kind of transforms that are purely algorithmic. Such as a FHIR transform of a FHIR JSON into a FHIR XML; or even a de-seralization - re-seralization. -- HOW can we identify these non-consequential transforms? Would like to hear that there is a term for this kind of transformation as different from one that might be more concequential relative to failure-modes.

John Moehrke (Dec 09 2019 at 21:16):

@Grahame Grieve @Brett Marquard @Lloyd McKenzie @Josh Mandel

Grahame Grieve (Dec 09 2019 at 21:19):

agent = device

Grahame Grieve (Dec 09 2019 at 21:19):

did you want more than that?

Lloyd McKenzie (Dec 09 2019 at 21:22):

What is it that makes the transformation non-consequential? XML -> JSON loses comments. Is that consequential? What if someone creates their own transform but it throws away certain extensions?

Gino Canessa (Dec 09 2019 at 21:38):

Personally, I lean towards signing only applying to a bit-exact representation, with any changes using some sort of claim chaining.
I feel like trying to define allowed transformations is a) a slippery slope (e.g., does changing units count as a change?) and b) raises the bar for implementers (e.g., a signature is for XML, but I'm working in JSON.. validation would at minimum require me to change the representation).

John Moehrke (Dec 09 2019 at 21:49):

Im not asking about signature.. I am asking about what is the trigger (how much of a change) is necessary to cause a "Basic Provenance".. Im am also NOT advising against Provenance on every non-consequential transform, you want to do that then you are free to do that. However the point is that at some level of transform one crosses from non-consequential to consequential.
to further emphasize ... there is a form of transformation that happens at every Internet Protocol (IP) gateway as the packets are transformed from electrical signal to digital representation, then back to electrical signal... This transform we are happy ignoring at the FHIR Provenance... right? Is there a FHIR transform from core json to core xml that we are happy with?

John Moehrke (Dec 09 2019 at 21:53):

What is it that makes the transformation non-consequential? XML -> JSON loses comments. Is that consequential? What if someone creates their own transform but it throws away certain extensions?

This is indeed the question...If we don't define some level that is consequential vs non-consequential; then every transaction with a FHIR server WILL need to have a Provenance attached for every Resource, and that Provenance must also have a Provenance attached, etc... I am trying to find the point at which "basic provenance" is not further interesting, vs when a transform is clearly interesting (like unit conversion to use Gino's example).

I don't expect the answer to be easy... hence why I ask

Grahame Grieve (Dec 09 2019 at 22:31):

I don't understand why we care. This is in respect to a particular IG?

Lloyd McKenzie (Dec 09 2019 at 22:45):

It's a question of whether a server should create a Provenance record every time data goes from one syntax to another (or even from syntax to object?)

Grahame Grieve (Dec 09 2019 at 22:47):

in what context?

Lloyd McKenzie (Dec 09 2019 at 22:50):

Receive JSON. Convert to XML for processing. Generate XML response. Convert to JSON prior to transmission. Is that 0 Provenance, 1 or 2?

Grahame Grieve (Dec 09 2019 at 22:51):

that depends on policy. That's not the sort of thing we can make rules about in the base spec. Which is why I keep asking what the context is

Keith Boone (Dec 09 2019 at 22:55):

So, having implemented a non-consequential transformation from V2 to FHIR in a production environment, and planning to do same for CDA, I can say that I want a provenance resource associated with the resulting FIR Resource that tells the message or document it came from, and where in that artifact (Segment ID or index), or document (XPath) that the content originated from, and the version of the software (device) And/or its configuration resources. That provenance is useful for a) diagnostics, b) tracebility. These are purely algorithmic, and in same cases, even lossless, transformations, but the syntax is different enough, and the transform complex enough, that I want a record of it somewhere.

I go so far as to add extensions to MY provenance record to report field content mapping rules that have been applied, but we only turn those on in test, not production. They are expensive.

Grahame Grieve (Dec 09 2019 at 23:00):

they would be expensive ;-)

Keith Boone (Dec 09 2019 at 23:05):

my “policy” is that if the transformation is complicated enough to require support (an engineer has to look at something related to it) in the future, that provenance is needed. If you can (as for FHIR json to xml and back) essentially verify round tripping with NO data loss (in canonical form) in the Transformation, it’s not what I would consider to be consequential.

if augmenting the resource via even mildly complex algorithm (e.g., terminology mapping Via simple lookup), it’s consequential enough for provenance for me.

Grahame Grieve (Dec 09 2019 at 23:05):

I don't mind recommending something along those lines in the spec, and i think it would be good to add an example

Jose Costa Teixeira (Dec 10 2019 at 06:01):

I usually do this on the system level, not on instance level - describing the transformation itself, not every transformed instance. But I think it makes sense to allow this on instance level.

Jose Costa Teixeira (Dec 10 2019 at 06:04):

Question on top of this: (how) do we add the notion of permission ? i.e. explaining not only "from where" but "why" did we get this data, or what the next link must know to eventually use the data

John Moehrke (Dec 10 2019 at 13:41):

So, having implemented a non-consequential transformation from V2 to FHIR in a production environment, and planning to do same for CDA, I can say that I want a provenance resource associated with the resulting FIR Resource that tells the message or document it came from, and where in that artifact (Segment ID or index), or document (XPath) that the content originated from, and the version of the software (device) And/or its configuration resources. That provenance is useful for a) diagnostics, b) tracebility. These are purely algorithmic, and in same cases, even lossless, transformations, but the syntax is different enough, and the transform complex enough, that I want a record of it somewhere.

I go so far as to add extensions to MY provenance record to report field content mapping rules that have been applied, but we only turn those on in test, not production. They are expensive.

Keith, that is NOT the non-consequential I am talking about. I agree that the use-case you give is clearly consequential. And this is documented on the FHIR core spec for Provenance.

John Moehrke (Dec 10 2019 at 13:43):

Receive JSON. Convert to XML for processing. Generate XML response. Convert to JSON prior to transmission. Is that 0 Provenance, 1 or 2?

Lloyd understands the kind of technical transform that I don't believe requires a Provenance record (except in the absolute extreme policy, which I would never forbid).

John Moehrke (Dec 10 2019 at 13:45):

I don't mind recommending something along those lines in the spec, and i think it would be good to add an example

I am not right now looking to update the FHIR core... I am working on the "Basic Provenance" IG. So we do have a policy set that is trying to set the minimal viable "basic" provenance. Which for the most part is: Must be who you received it from, and author if you know it.

John Moehrke (Dec 10 2019 at 13:48):

So, HOW do we describe the kinds of "Transforms" that must be recognized with a Provenance, from the kind of "transforms" (xml-->json, json-->xml) that are clearly not necessary to record in a Provenance when working in the Basic Provenance IG policy space.

John Moehrke (Dec 10 2019 at 13:50):

I see Lloyd used the word "syntax" translation?

Lloyd McKenzie (Dec 10 2019 at 15:38):

Lloyd also mentioned that no transformation is necessarily irrelevant. While we have high confidence that the reference implementation transforms are lossless, not everyone will use that code (or use it unchanged). So the question of "how confident are you in no information loss" needs to be answered

John Moehrke (Dec 10 2019 at 16:05):

fully agree. I do think that there is some bright-line that can be drawn. I think the confidence question can always be used to increase the number of Provenance relevant events. We are not trying to put a limit, just trying to set realistic expectations.

John Moehrke (Dec 10 2019 at 16:14):

A definition that seems to be emerging: A non-consequential transform is one that is performed by automated algorithm that is without loss of data and fully reversible. Is this clear? Is this proper?

John Moehrke (Dec 10 2019 at 16:19):

Seems this covers FHIR json<-->xml; network layer; and internal system processing. Do the things that we consider concequential fall outside that definition? Use of the HL7 published CCDA-on-FHIR transform? Automated conversion of v2 messages to FHIR? Disassembly of CDA into FHIR resource?

John Moehrke (Dec 10 2019 at 16:20):

What about taking a serialized FHIR-Document (Bundle) and serving the resources within using FHIR REST query/read? Should this be a consequential Transform? Is it so close to the line that we need to call it out explicitly?

Jose Costa Teixeira (Dec 10 2019 at 16:52):

My current practice when implementing data lineage is that there is no clear line. Transformations will evolve. All transformations must be accounted for and have an owner. The owner is then responsible for evaluating the transformation along its life cycle, deciding what are the risks and controls at any time

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Provenance overload on transforms · Security and Privacy

Stream: Security and Privacy

Topic: Provenance overload on transforms

John Moehrke (Dec 09 2019 at 21:14):

John Moehrke (Dec 09 2019 at 21:16):

Grahame Grieve (Dec 09 2019 at 21:19):

Grahame Grieve (Dec 09 2019 at 21:19):

Lloyd McKenzie (Dec 09 2019 at 21:22):

Gino Canessa (Dec 09 2019 at 21:38):

John Moehrke (Dec 09 2019 at 21:49):

John Moehrke (Dec 09 2019 at 21:53):

Grahame Grieve (Dec 09 2019 at 22:31):

Lloyd McKenzie (Dec 09 2019 at 22:45):

Grahame Grieve (Dec 09 2019 at 22:47):

Lloyd McKenzie (Dec 09 2019 at 22:50):

Grahame Grieve (Dec 09 2019 at 22:51):

Keith Boone (Dec 09 2019 at 22:55):

Grahame Grieve (Dec 09 2019 at 23:00):

Keith Boone (Dec 09 2019 at 23:05):

Grahame Grieve (Dec 09 2019 at 23:05):

Jose Costa Teixeira (Dec 10 2019 at 06:01):

Jose Costa Teixeira (Dec 10 2019 at 06:04):

John Moehrke (Dec 10 2019 at 13:41):

John Moehrke (Dec 10 2019 at 13:43):

John Moehrke (Dec 10 2019 at 13:45):

John Moehrke (Dec 10 2019 at 13:48):

John Moehrke (Dec 10 2019 at 13:50):

Lloyd McKenzie (Dec 10 2019 at 15:38):

John Moehrke (Dec 10 2019 at 16:05):

John Moehrke (Dec 10 2019 at 16:14):

John Moehrke (Dec 10 2019 at 16:19):

John Moehrke (Dec 10 2019 at 16:20):

Jose Costa Teixeira (Dec 10 2019 at 16:52):