FHIR Chat · Security label for data de-identified using perturbation

Stream: Security and Privacy

Topic: Security label for data de-identified using perturbation

Ranvijay Kumar (Apr 09 2020 at 23:54):

We are generating de-identified data and want to add security labels to the resources and bundles. One of the de-identification methods we use is perturbation, in which specific values are replaced with equally specific, but different, values. Date-shifting is one example of it. However, I do not see any tag in the v3 value set that perturbation can map to. https://www.hl7.org/fhir/v3/ObservationValue/cs.html#v3-ObservationValue-_SecurityObservationValue

@John Moehrke any recommendation?

John Moehrke (Apr 10 2020 at 11:28):

I would not think that one would label data with the method used for de-identification. I realize that there are some methods in the vocabulary, but they are in there more to be used in policy.

John Moehrke (Apr 10 2020 at 11:32):

The recommendation given on the security page http://build.fhir.org/secpriv-module.html#deId is to tag the data with the high-level method, not the specific method -- ANONYED, MASKED, PSEUDED, REDACTED. So the perturbation on some indirect-identifiers is specific to that element. The high-level method would be a statement of what was done to the dataset.

John Moehrke (Apr 10 2020 at 11:38):

the most specific would be to have recorded under authorized access only, the full algorithm used on the dataset including risk analysis, element analysis, methods, and proof. This record would be given a unique identifier, aka the policy. That policy unique identifier would be recorded in a Provenance.policy; where the Provenance.target points at all of the dataset. -- this is part of a CR J#18874 that I have been looking to write.
The point of this, is that you can record the complete algorithm, but that it is very specific (and protected appropriately).

Ranvijay Kumar (Apr 10 2020 at 18:38):

Thanks for you response. Depending on the need one can apply masking, redaction, and pseudonymization at the same time on different elements of a resource. In such cases, what security label should be used at the high level? Should we use multiple labels, or pick one that broadly communicates the meaning?

Ranvijay Kumar (Apr 10 2020 at 18:39):

CR J#18874 looks very interesting. Looking forward to it.

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Security label for data de-identified using perturbation · Security and Privacy