FHIR Chat · De-identification mechanisms in FHIR · implementers

Stream: implementers

Topic: De-identification mechanisms in FHIR


view this post on Zulip Emmanuel Helm (Jun 28 2016 at 13:34):

Since I couldn't find a discussion on this topic - is someone working on de-identification mechanisms in FHIR? Where can I find that?

E.g. I would like to have an interface to specific resources that only provides "pseudonymized" data.

view this post on Zulip John Moehrke (Jun 28 2016 at 13:45):

HI Emmanuel. A year ago I asserted that this is a mechanism of the rights the requester holds https://healthcaresecprivacy.blogspot.com/2015/06/fhir-does-not-need-deidentifytrue.html

view this post on Zulip John Moehrke (Jun 28 2016 at 13:45):

that said, this doesn't mean we can't find a useful service.

view this post on Zulip John Moehrke (Jun 28 2016 at 13:46):

The problem is that de-identification is a process, not a state. https://healthcaresecprivacy.blogspot.com/p/topics.html#DEID

view this post on Zulip John Moehrke (Jun 28 2016 at 13:48):

There are some really helpful efforts that are re-invigorating the concepts of creating a more automatable de-identification algorithm, where as today this is far more 'art' than science.

view this post on Zulip John Moehrke (Jun 28 2016 at 13:51):

What would be a good first step is to create patterns in FHIR that identifiy the elements that are: Direct identifiers, Indirect identifiers (quasi identifiers), and data. This classification is the first step. Then we need a set of use-cases. What DICOM did in this space is a really useful pattern.

view this post on Zulip John Moehrke (Jun 28 2016 at 13:51):

see DICOM details http://dicom.nema.org/dicom/2013/output/html/part15.html#chapter_E

view this post on Zulip John Moehrke (Jun 28 2016 at 13:52):

see IHE de-identification handbook http://ihe.net/uploadedFiles/Documents/ITI/IHE_ITI_Handbook_De-Identification_Rev1.1_2014-06-06.pdf

view this post on Zulip John Moehrke (Jun 28 2016 at 13:53):

and IHE tool for handbook http://ihe.net/uploadedFiles/Documents/ITI/IHE_ITI_Handbook_De-Identification-Mapping_Rev1.1_2014-06-06.xlsx

view this post on Zulip Emmanuel Helm (Jun 28 2016 at 14:16):

Thank you for this detailed answer. You gave me something to think/work about.

view this post on Zulip John Moehrke (Jun 28 2016 at 14:25):

there is also a new sub-workgroup looking at this in the mobile health worgroup. -- starting based on the buzz that Apple kicked up around Differential Privacy.

view this post on Zulip John Moehrke (Jun 28 2016 at 18:01):

The email thread from mobilehealth can be found at http://lists.hl7.org/read/messages?id=297060

view this post on Zulip Grahame Grieve (Jun 28 2016 at 19:44):

is it really possible to classify usefully? We have a slot for the classification, but lots of text fields are 'maybes' - might include a name in some instances.., and then there's narrative

view this post on Zulip Grahame Grieve (Jun 28 2016 at 19:44):

and in Observation.subject, for instances - is that identifiying?

view this post on Zulip John Moehrke (Jun 28 2016 at 20:24):

Grahame, great questions... Text is always suspect, so it is always classified as a direct identifier. subject would clearly be a direct identifier, right? When would it not?

view this post on Zulip Grahame Grieve (Jun 28 2016 at 22:19):

well, if you deidentity the subject, then the subject reference is not identifying anymore

view this post on Zulip Grahame Grieve (Jun 28 2016 at 22:20):

and that's what you'd have to do

view this post on Zulip Grahame Grieve (Jun 28 2016 at 22:21):

so I think there's 4 categories:
- identifying information
- references to identifying resources
- data
- text that may contain identifying information

view this post on Zulip Erich Schulz (Jun 28 2016 at 22:28):

one of the roles of an approach such as CQL is it allows gathering for data sets remotely in a de-identified manner

view this post on Zulip Erich Schulz (Jun 28 2016 at 22:29):

rather than asking organisations to send their raw data they are sent a query in CQL instead which reports back just the aggregate information

view this post on Zulip Erich Schulz (Jun 28 2016 at 22:30):

this approach was being used successfully in the UK way back in 1990s

view this post on Zulip John Moehrke (Jun 29 2016 at 01:36):

there is a bit more than that... elements that are mostly just data, but have edge cases that are troubling (weight, height, gender, etc). etc... see the IHE Handbook spreadsheet...

view this post on Zulip John Moehrke (Jun 29 2016 at 01:38):

Erich, agreed services that can be asked questions and return just a result can be helpful; but the can also be twisted just as effectively if not designed carefully. So a query service is not a panacia. think we need to look at multiple options and let policy choose the one used on a case-by-case basis.

view this post on Zulip Grahame Grieve (Jun 29 2016 at 13:20):

well, any data is potentially re-identifying. Just depends how unique it is. big value sets.... yay

view this post on Zulip John Moehrke (Jun 29 2016 at 16:55):

I find it interesting that we seem to have the same discussion TWO years ago, as I wrote this blog post on this day two years ago: https://healthcaresecprivacy.blogspot.com/2014/06/de-identifying-free-text.html

view this post on Zulip Grahame Grieve (Jun 29 2016 at 20:35):

well, you suggested that we classify fields - we could do this in the resource definitions / profiles. but the subsequent dsicussion suggests that this is not really very usefu

view this post on Zulip John Moehrke (Jun 29 2016 at 20:45):

this is something I figured could have been part of the RDF. I don't know if it fits well, but it seems like something that could. Not a prime motivation of the RDF people, but phase it in? The classification of each element in resources is to define what the normal use is, clearly if someone makes a pseudo-Patient and marks that instance of Patient as a pseudo then use of Observation.subject that points at that Patient instance has downgraded the .subject from "Direct-Identifier" to "Pseudo-Identifier". What we need is a more useable classification than the three big classifications (Direct-Identifier, Quasi, Data). SPecifically Quasi needs to be subdivided into various distances, this classification doesn't exist anywhere that I know of.

view this post on Zulip Grahame Grieve (Jun 29 2016 at 21:03):

well, we can build whatever we do in to the definitions and therefore the RDF. No problems. But.... it's not clear to me what to do...


Last updated: Apr 12 2022 at 19:14 UTC