Stream: implementers
Topic: Anonymization / Preudonymization in core standard?
Jose Costa Teixeira (Sep 25 2017 at 11:34):
Anything being considered? @John Moehrke ?
I would love to try and get data for a patient, but because I don't have the right permissions (whatever is the reason for permission being there or not) I would either get full data, or anonymized data or pseudonymized data.
John Moehrke (Sep 25 2017 at 12:43):
Jose, I blogged on this June 2015... http://healthcaresecprivacy.blogspot.com/2015/06/fhir-does-not-need-deidentifytrue.html
John Moehrke (Sep 25 2017 at 12:45):
but recognize that de-identification is not something that can be well-done on a resource-by-resource basis. http://healthcaresecprivacy.blogspot.com/2015/02/is-it-really-possible-to-anonymize-data.html
Jose Costa Teixeira (Sep 25 2017 at 12:46):
Ok. I do not think that anyone can harmonize what "de-identification" means at the level of a standard. As you point out, that is not at the standard level, not even at design level. It is at run level.
Jose Costa Teixeira (Sep 25 2017 at 12:47):
so, no de-identify_true on the request side. that is clear.
Jose Costa Teixeira (Sep 25 2017 at 12:47):
my wandering was more about: is there any impact on the actual response side?
John Moehrke (Sep 25 2017 at 12:48):
I don't understand your question...
Jose Costa Teixeira (Sep 25 2017 at 12:48):
any metadata that we can use to say "well you asked for something, but this flag tells you that you are getting partial data/de-identified dat/no data, so that you can do whatever you want there"?
John Moehrke (Sep 25 2017 at 12:51):
Of course there is that flag. That is within the security labels. http://build.fhir.org/security-labels.html
John Moehrke (Sep 25 2017 at 12:52):
There are a few to choose from, based on what is actually done http://build.fhir.org/v3/ObservationValue/cs.html#v3-ObservationValue-ANONYED
Jose Costa Teixeira (Sep 25 2017 at 12:52):
my example is for GDPR, but a simple one would be : If i want to get my patient's medication, and if some of that medicaiton is some especially sensitive information, say phychiatric drugs. I should not get all the medications at first, right? I should get only the ones that I can get on a normal basis, and perhaps a warning (?) to say "there is more" - so that i can break the glass if needed.
John Moehrke (Sep 25 2017 at 12:52):
http://build.fhir.org/v3/ObservationValue/cs.html#v3-ObservationValue-PSEUDED
John Moehrke (Sep 25 2017 at 12:53):
just like when _summary is asked for http://build.fhir.org/v3/ObservationValue/cs.html#v3-ObservationValue-SUBSETTED
John Moehrke (Sep 25 2017 at 12:54):
OH, you are asking about Break-Glass??? THat is a different workflow
Jose Costa Teixeira (Sep 25 2017 at 12:57):
Well, the first scenario triggers break-glass. Subsetted, anonyed or Pseuded should be good.
John Moehrke (Sep 25 2017 at 12:58):
Break-Glass has many meanings under many different workflows, so it is very hard to write a standard way for it to be done. But generally, one asks for "N" Normal confidentiality data, but when one suspects that there is more sensitive information that would be important for safey reasons, then one asks for "R" Restricted confidential data.... If you are a privileged user, the request will be granted. Asking for "R" is breaking-glass.
Jose Costa Teixeira (Sep 25 2017 at 12:58):
yep, but for me it is the same machine as above, operating under different rules, right?
John Moehrke (Sep 25 2017 at 12:58):
https://healthcaresecprivacy.blogspot.com/2015/12/break-glass-on-fhir-solution.html
Jose Costa Teixeira (Sep 25 2017 at 12:59):
meaning, the machine that gives you the N data can operate with another parameter (can be a security label?) to get restricted data.
Grahame Grieve (Sep 25 2017 at 13:00):
@John Moehrke http://build.fhir.org/security-labels.html#break-the-glass
John Moehrke (Sep 25 2017 at 13:02):
Yes, @Grahame Grieve I addressed that surprise text in my blog https://healthcaresecprivacy.blogspot.com/2015/12/break-glass-on-fhir-solution.html I am not sure that is the right solution, and it is not the work of the security committee.
Jose Costa Teixeira (Sep 25 2017 at 13:03):
where do the security labels go, e.g. in my case of getting medications before breaking the glass?
In the bundle for the response? in the patient resource? in the medication resources (which ones, if we are hiding them?)
Or in the case of an anonymized patient, would I have two patient resources, one for realPatient and another for johnDoe?
John Moehrke (Sep 25 2017 at 13:03):
I have asked that SMART-on-FHIR include additional scopes that allow the declaration of break-glass to be sent to the OAuth decision, not as an override parameter on a request to the resource server...
John Moehrke (Sep 25 2017 at 13:04):
but, my conclusion is .. much more work is needed... need people trying this to try a bunch of things ad share their experience so that we can help others.
John Moehrke (Sep 25 2017 at 13:05):
Jose, please separate de-identification from break-glass... they are use-cases that do NOT overlap
John Moehrke (Sep 25 2017 at 13:05):
secuity labeles go into the Resource meta
John Moehrke (Sep 25 2017 at 13:05):
see http://build.fhir.org/security-labels.html#rsl
Jose Costa Teixeira (Sep 25 2017 at 13:05):
I thought that seeing something that is anonymized can trigger break-glass scenario... but ok, that is not my focus
John Moehrke (Sep 25 2017 at 13:06):
no.. I can't even understand how that would work
John Moehrke (Sep 25 2017 at 13:07):
You ask for medications on Patient X, and I return data that has no patient... and you are then indicated that this s a break-glass opportunity?
Jose Costa Teixeira (Sep 25 2017 at 13:07):
yes, they go in the resource meta - but which resource? I ask for a patient's medication. The system determines i cannot see everything. sends a bundle.
John Moehrke (Sep 25 2017 at 13:08):
So. the solution I wrote in this thread is that you get back only results that have "N" as the confidentiality code..
John Moehrke (Sep 25 2017 at 13:08):
You then ask for "R" confidentiality code and get MORE data
John Moehrke (Sep 25 2017 at 13:08):
thus.. you broke glass when you asked for "R"
Jose Costa Teixeira (Sep 25 2017 at 13:08):
or another scenario: get all patients with some procedure. Say I don't have consent from them all. Machine can say for this request, and this requester, the response is some anonymized results + some "normal " resources
John Moehrke (Sep 25 2017 at 13:09):
How is break-glass applicable to a "Get all patients with some procedure"?
Jose Costa Teixeira (Sep 25 2017 at 13:09):
nevermind break-glass, i am steering away from that as you asked :)
John Moehrke (Sep 25 2017 at 13:09):
where is the danger to a patient, if you are not treating A patient?
John Moehrke (Sep 25 2017 at 13:10):
break-glass is a mechanism to override restrictions because there is a safety risk to life
John Moehrke (Sep 25 2017 at 13:10):
so, what is your use-case????
Jose Costa Teixeira (Sep 25 2017 at 13:11):
i wanted to know in which resource are the labels - in the bundle, or in the patient?
Jose Costa Teixeira (Sep 25 2017 at 13:12):
as a response to "get all patients with a procedure", would I get a) only the "N" Stuff without an indication there is more? or b) The N stuff and some John Does in place of the R ones?
John Moehrke (Sep 25 2017 at 13:12):
both.... Each resource can have security labels, which is an assessment of the content of that resource... A bundle can have security labels, which is an assessment of the WHOLE content of that bundle. A bundle might also contain security labels that are specific to the transaction, such as residual-obligations (Do not print, Do not persist, etc).
Jose Costa Teixeira (Sep 25 2017 at 13:13):
ok, so if I have John Doe's, is this the same resource as the original one? Or is it another resource?
John Moehrke (Sep 25 2017 at 13:13):
so your last question is a question of POLICY... HL7 does not mandate policy, we enable many reasonable policies
John Moehrke (Sep 25 2017 at 13:14):
I presume in your use-case the persona requesting data is a "Researcher"... right?
Jose Costa Teixeira (Sep 25 2017 at 13:14):
i am not asking what the machine should do. this is a runtime decision, what to display.
Jose Costa Teixeira (Sep 25 2017 at 13:14):
so we depend on which persona it is?
John Moehrke (Sep 25 2017 at 13:14):
security and privacy?
John Moehrke (Sep 25 2017 at 13:15):
Yes!
John Moehrke (Sep 25 2017 at 13:15):
the random malicious user on the internet, gets no data at all... right?
Jose Costa Teixeira (Sep 25 2017 at 13:16):
No, not for the standard. The standard should not depend on which persona. There is a machine that takes data (including which person or persona) and the standard must support that information.
John Moehrke (Sep 25 2017 at 13:16):
The standard enables many policies
Jose Costa Teixeira (Sep 25 2017 at 13:16):
I would expect a standard to support a machine to calculate whatever needs to be sent, regardless of how that calculation is done.
John Moehrke (Sep 25 2017 at 13:17):
I don't understand your calculation question
Jose Costa Teixeira (Sep 25 2017 at 13:17):
calculation (sorry) is "let's see hwat this person can try to access in this context"
Jose Costa Teixeira (Sep 25 2017 at 13:17):
for example, GDPR is not persona-bound, but purpose-bound.
Jose Costa Teixeira (Sep 25 2017 at 13:18):
calculation = a black box that determines whatever is the possible access, from whatever needed variables.
John Moehrke (Sep 25 2017 at 13:18):
so any malicious user can just declare that they have the purpose of surgery, and get all the access they ask for?
Jose Costa Teixeira (Sep 25 2017 at 13:19):
Short answer from GDPR Art 30 compliance - Yes.
John Moehrke (Sep 25 2017 at 13:19):
I think you better look again
Jose Costa Teixeira (Sep 25 2017 at 13:20):
of course you have to ensure that only people authorized to do something have the access.
John Moehrke (Sep 25 2017 at 13:20):
okay.. .so we are back to athorized peopel
John Moehrke (Sep 25 2017 at 13:20):
good
Jose Costa Teixeira (Sep 25 2017 at 13:20):
access control is not what I am talking about.
Jose Costa Teixeira (Sep 25 2017 at 13:20):
yes, i so wrote "context" . Can be the person, the persona, the purpose, the time of day.
John Moehrke (Sep 25 2017 at 13:21):
then please explain.. because it is what you are asking about in my perspecive
Jose Costa Teixeira (Sep 25 2017 at 13:21):
limiting it to persona is, well, limiting.
John Moehrke (Sep 25 2017 at 13:21):
persona is a general term
John Moehrke (Sep 25 2017 at 13:24):
You were asking about a general query "Any patient with procedure X?" I was asking how this was a "Treatment" use-case, but it looked like a "Research" use-case... so I simply asked that question.
John Moehrke (Sep 25 2017 at 13:25):
I would then ask what policy is restricting access? Consent-Policy? Role-Policy? Purpose-Policy? etc...
John Moehrke (Sep 25 2017 at 13:27):
doing a query for Any patient wih procedure X is easy... I am sure you know how to do that... The FHIR specification is clear on 'how' to do this... What seems to be the topic is how access control policy might restrict the results... so I ask about persona
Jose Costa Teixeira (Sep 25 2017 at 13:32):
I get the securiy labels, thanks. Now let's put the asking person or persona aside. If I want to ask "For my purpose, i need everyone that has a diagnosis of 290.3". Say the database kows that there are 3 people, and one of them -because they are a VIP, or because they did not give consent for studies, cannot be accessed for studies.
Jose Costa Teixeira (Sep 25 2017 at 13:33):
the database knows Patient1, Patient2, Patient3, but returns Patient1, Patient2 and JohnDoe. So, is JohnDoe a resource in the database? Or it is just on the response side?
Jose Costa Teixeira (Sep 25 2017 at 13:36):
and where do the labels reside? would you say "some of this bundle's resources are anonymized or may be missing"? Or would the bundle say "here are all the results" and each of the resources would say it is anonymized or the real thing?
Jose Costa Teixeira (Sep 25 2017 at 13:38):
the case of medication is one where there is risk in not sharing the information - if you want to do ICA check, you should know all the medication.
Grahame Grieve (Sep 25 2017 at 13:38):
back to break-the-glass - I don't believe that making a new session is a solution here. But I'm happy for the security committee to take ownership of how break the glass is actually done
Jose Costa Teixeira (Sep 25 2017 at 13:44):
I think we need a security label of sorts in the request. IMHO, "break the glass" seems boolean (either there or not) and I think that we should be able to support "Purpose for which I am asking for this information"
John Moehrke (Sep 25 2017 at 14:43):
Jose, Your use-case simply does not look like a Treatment use-case, so it is really hard for me to align it with break-glass. Break-glass is exclusively a Treatment use-case.
John Moehrke (Sep 25 2017 at 14:45):
Your solution (Patient 1, Patient 2, and JohnDoe) doesn't make any logical sense to me in any kind of access. You mention a VIP or someone who didn't give consent... in those cases you simply will not be given the Patient3 data at all. I know of no such system that would try to de-identify a patient, as it is very hard to de-identify a single patient; and a de-identified patient is not appropriate for Treatment cases.
John Moehrke (Sep 25 2017 at 14:55):
Provided you had a Policy such as you are describing, it would be a unique policy. It would not be a common policy. Thus it would be unlikely to rise to the 80% and thus unlikely to be given a standard vocabulary items. However tags can be locally defined. Within the domain that has defined the policy you speak of they could define a Tag that would be used on the Bundle to indicate that some data is de-identified, and another tag that is applied to the data of the JohnDoe patient (all the resources from that JohnDoe patient).
Jose Costa Teixeira (Sep 25 2017 at 14:55):
Not giving consent for that purpose, i mean. I can give consent to many things but not for studies.
John Moehrke (Sep 25 2017 at 14:56):
Why not?
John Moehrke (Sep 25 2017 at 14:56):
Note that the confidentiality value "L" indicates low risk because the data has been de-identified http://build.fhir.org/v3/Confidentiality/cs.html#v3-Confidentiality-L
Jose Costa Teixeira (Sep 25 2017 at 14:57):
no, i mean, "patient did not give consent" ->, " patient did not give consent for a specific purpose"
John Moehrke (Sep 25 2017 at 14:58):
So, that is why I was asking about Persona... Doctor/Nurse persona --> Treatment... Researcher persona --> Research... Insurance-clerk --> Payment...
John Moehrke (Sep 25 2017 at 14:58):
etc...
John Moehrke (Sep 25 2017 at 14:59):
PurposeOfUse is a only authorized for given persona.
Jose Costa Teixeira (Sep 25 2017 at 15:00):
but purpose is on the consent resource.
John Moehrke (Sep 25 2017 at 15:00):
as are many things
Jose Costa Teixeira (Sep 25 2017 at 15:00):
should be in the request, right?
John Moehrke (Sep 25 2017 at 15:01):
The user and functional role are part of the OAuth flow... it is up to OAuth to PERMIT or DENY...
John Moehrke (Sep 25 2017 at 15:01):
today SMART-on-FHIR does not well represent anything other than Treatment or PatientAccess
John Moehrke (Sep 25 2017 at 15:01):
I have submitted many comments on this
John Moehrke (Sep 25 2017 at 15:02):
BUT, SMART is just one security layer solution... there are others that one might choose to use... Hence why ask about Access Control system and Policy.
Jose Costa Teixeira (Sep 25 2017 at 15:04):
let me keep going down the rabbit hole, steering back to medication list:
Mr X has given consent for a specific purpose. not for studies.
Nurse Y asks to get the medication data for Mr X, "for ICA checking". System says "sure, here you go".
Nurse Y then asks to get the medication data for Mr. X, "for a study". System should say "nope"
Jose Costa Teixeira (Sep 25 2017 at 15:06):
does that make sense?
Jose Costa Teixeira (Sep 25 2017 at 15:07):
Same case, Nurse Y asks to see a patient's medication because Mr. X is in shock. Regardless of any consent, system says "sure, here it is".
John Moehrke (Sep 25 2017 at 15:08):
when you use the word "study" do you mean "clinical research"? Specifically, a non-Treatment use-case?
Jose Costa Teixeira (Sep 25 2017 at 15:08):
Anything that is not given explicit consent by the patient nor has a valid justification. Can be for clinical or non-clinical reasons.
John Moehrke (Sep 25 2017 at 15:10):
If you can't be specific, I can't help you... We have a well defined concept (standardized) for "Treatment", and "Research". I don't know what you mean by "study". Your use of the word study is not clear to me. Do you mean the Treatment use of the word study, as in an "Imaging Study"?
Jose Costa Teixeira (Sep 25 2017 at 15:11):
Billing
John Moehrke (Sep 25 2017 at 15:11):
Okay, billing is also a well defined term "Payment" or "Coverage"
John Moehrke (Sep 25 2017 at 15:12):
see http://build.fhir.org/v3/PurposeOfUse/vs.html
John Moehrke (Sep 25 2017 at 15:13):
which in a Request for access is an intent scope. I intend only to use the results given for "Treatment"...
John Moehrke (Sep 25 2017 at 15:14):
Which would be common to include in a Access Control request scope, and you would be given a scope based on your permissions.
Jose Costa Teixeira (Sep 25 2017 at 15:14):
ok i get those codes, but how is the request for access? is it the GET or something that has to be persisted beforehand?
Jose Costa Teixeira (Sep 25 2017 at 15:14):
in my case, i have 3 requests. the server should have enough information to reply differently on those 3 requests
John Moehrke (Sep 25 2017 at 15:15):
so, what is your security layer?
John Moehrke (Sep 25 2017 at 15:15):
is it SMART?
John Moehrke (Sep 25 2017 at 15:15):
or something else?
John Moehrke (Sep 25 2017 at 15:16):
because what you seek is a question of the Access Control layer, not of FHIR
Jose Costa Teixeira (Sep 25 2017 at 15:17):
Ok that starts providing me an answer to one of my questions. It is on the security layer. I didn't know that you would handle diffferent cases with different access types. I would just throw everything at the FHIR server and expect it to work it out for me.
Jose Costa Teixeira (Sep 25 2017 at 15:19):
i think that makes life a bit more complicated on my end. I don't know how to expose "purpose" when establishing a session.
John Moehrke (Sep 25 2017 at 15:19):
No, we are not building into FHIR core a specific access control model. Even HTML has no access control model. The access control layer is a pluggable layer that can be inserted as necessary for the policy.
John Moehrke (Sep 25 2017 at 15:21):
For example, some want to use OAuth; where others already have SAML/XACML; and others are happy with Mutually-Authenticated-TLS. I am sure there will eventually be a Blockchain model. Likely someone will even use http basic Username/Password. And many more in the future.
Jose Costa Teixeira (Sep 25 2017 at 15:21):
not sure about what access control model is that.. I just wanted to see that the server knows who asked for which data for which purpose.
John Moehrke (Sep 25 2017 at 15:22):
SMART-on-FHIR is a security layer that is trying to define some of this. Starting simple.
Jose Costa Teixeira (Sep 25 2017 at 15:22):
instead of who asked for which data and their purpose is their business.
John Moehrke (Sep 25 2017 at 15:22):
Did you provide these kind of comments to the SMART-on-FHIR ballot?
Jose Costa Teixeira (Sep 25 2017 at 15:24):
so, my takeout: I do not know how to convey "purpose of processing" in a request for data. Perhaps that is for Smart-on-FHIR.
I also do not know how i would see "we know that this patient is using 5 medications, but because this person did not provide the right purpose, we'll let him know about 3 of those".
Jose Costa Teixeira (Sep 25 2017 at 15:25):
oh and i would have gladly provide comments on the SMART-ON-FHIR if only i had known it existed before today :)
Jose Costa Teixeira (Sep 25 2017 at 15:26):
i am just checking some aspects in the API that could be helpful for GDPR
John Moehrke (Sep 25 2017 at 15:29):
It was in the September ballot http://www.hl7.org/fhir/smart-app-launch/index.html
John Moehrke (Sep 25 2017 at 15:29):
And I think what you are asking for is stepping-stones beyond where they are trying to address today.
Jose Costa Teixeira (Sep 25 2017 at 15:29):
translate stepping-stones?
John Moehrke (Sep 25 2017 at 15:30):
small steps on the pathway to a large journey.
Jose Costa Teixeira (Sep 25 2017 at 15:30):
thanks.
John Moehrke (Sep 25 2017 at 15:30):
today they address only Providers accessing for Treatment, and Patients accessing for their own PatientAccess purpose.
John Moehrke (Sep 25 2017 at 15:31):
so today they don't have PurposeOfUse, they don't have Break-Glass, they don't have Research, etc...
Jose Costa Teixeira (Sep 25 2017 at 15:31):
in any case, for me purpose of processing and access control are different beasts.
John Moehrke (Sep 25 2017 at 15:31):
All of that is advanced stuff well beyond first generation of any access control sysstem
John Moehrke (Sep 25 2017 at 15:32):
purpose of processing -- you mean PurposeOfUse? If so, this is a critical part of Access Control.
Jose Costa Teixeira (Sep 25 2017 at 15:32):
oh and isn't "break-the-glass" another purpose? I don't feel comfortable if it is boolean..but I can't pinpoint why.
John Moehrke (Sep 25 2017 at 15:33):
Yes, break-glass is sometimes implemented using PurposeOfUse - Emergency... but not always
Jose Costa Teixeira (Sep 25 2017 at 15:33):
yes, purpose of processing in GDPR can be "Purpose of Use" and "Purpose of exchange"
John Moehrke (Sep 25 2017 at 15:33):
again.. Policy
John Moehrke (Sep 25 2017 at 15:34):
Again.. I have blogged on that topic https://healthcaresecprivacy.blogspot.com/2015/12/break-glass-on-fhir-solution.html
John Moehrke (Sep 25 2017 at 15:36):
and scopes https://healthcaresecprivacy.blogspot.com/2016/01/fhir-oauth-scope.html and https://healthcaresecprivacy.blogspot.com/2017/05/fhir-oauth-scope-proposal-using-fhir.html
Ioana Singureanu (Sep 26 2017 at 20:46):
I think we are disregarding the possibility that a resource (e.g. Observation) may "contain" the "subject" Patient or Person resource. In that case the observation is effectively "identified". We need to specify a caveat for resource with contained references to Patient, Person, etc. @Grahame Grieve could you weigh in? Do we need to clarify the statemed "FHIR resources do not require de-identification" to "FHIR resources do not require de-identification unless the Patient or Person resource is 'contained' in that resource and thus accessible to clients".
John Moehrke (Sep 26 2017 at 20:55):
Ioana, It is necessary but not sufficient to remove Direct Identifiers. Many failed de-identification efforts are due to the Indirect Identifiers (aka Quasi Identifiers). These are much harder to deal with. http://healthcaresecprivacy.blogspot.com/2015/02/is-it-really-possible-to-anonymize-data.html
John Moehrke (Sep 26 2017 at 20:57):
Note that Security WG does have a CR asking for something specific to be said on De-Identification. We will provide, but recognize that all that can be done is to show the need for De-Identification, which is a Process. See my blog topic page for many articles on De-Identification https://healthcaresecprivacy.blogspot.com/p/topics.html#DEID
John Moehrke (Sep 26 2017 at 20:58):
See GF#10581 and related GF#10580... unfinished CR, but glad to receive feedback and recommendations we can consider.
Lloyd McKenzie (Sep 27 2017 at 03:43):
"subject" can't ever be Person - Person is only for linking. And you could have an anonymized/pseudonymized Patient instance that was contained (perhaps just specifying gender and birth year, for example).
John Moehrke (Sep 27 2017 at 14:04):
@Ioana Singureanu explained elsewhere that she was expressing a hidden risk that might not be detected when one presumes that they can just replace the Subject reference with a pseudonym. That is the previous Subject may have been pointing to a "Contained" resource, that one must extract out. A good point!
John Moehrke (Sep 27 2017 at 14:05):
I have also just published an article on Bulk De-Identification https://healthcaresecprivacy.blogspot.com/2017/09/fhir-and-bulk-de-identification.html
Grahame Grieve (Sep 27 2017 at 22:12):
@Avinash Shanbhag - John's article talks about the lack of well described use cases - can ONC publish a description of the use cases for bulk data access somwhere?
Avinash Shanbhag (Sep 27 2017 at 23:53):
@Avinash Shanbhag - John's article talks about the lack of well described use cases - can ONC publish a description of the use cases for bulk data access somwhere?
Hi Guys - Yes, for folks that attended the ONC hosted Interoperability Forum in Aug, Dr. Rucker described the use of bulk access. Briefly, the idea is that Institutions that purchase or provide healthcare for their employees etc; would be able to to use the information (thru appropriate contracts with provider organizations) to manage healthcare more effectively.
Avinash Shanbhag (Sep 27 2017 at 23:54):
Sorry, hit enter too soon. I wanted to end by saying that, yes, we will work to develop a use case to share with FHIR community. It might take some time, so wanted to provide high level thoughts, above.
John Moehrke (Sep 28 2017 at 13:02):
@Avinash Shanbhag That use-case has many challenges with current USA privacy law, both Federal and State. It does not address questions of Privacy-Consent, Sensitive Health topics, PurposeOfUse, Role/User authorization management, etc...
Jose Costa Teixeira (Oct 02 2017 at 12:37):
@Lloyd McKenzie, from your comment, would it make sense to have a real patient (for all allowed purposes) and an anonymized patient (for all purposes where identifiable information cannot be seen), with some eventual link between those?
John Moehrke (Oct 02 2017 at 12:55):
@Jose Costa Teixeira There are pseudonymization mechanisms, often used in clinical research where at the end of a study at-risk patients need to be contacted. The linkage would be done by a trusted-third-party that everyone trusts to not inappropriately expose the linkage. They might use any means they want, as it is an internal responsibility.
Jose Costa Teixeira (Oct 02 2017 at 13:01):
Indeed, that is what I expect (although there seems to be dome debate on it by the DPAs). The question is: would we then have two patient resources, one being de-identified?
Jose Costa Teixeira (Oct 02 2017 at 13:02):
and if we do use two patient resources, one identifed, and another de-identified, would a link between them be part of FHIR specification? Or would that be an attribute like an external identifier?
Jose Costa Teixeira (Oct 02 2017 at 13:04):
but main question is whether we'd have those 2 patient resources or if there are other, more clever ways to do this
John Moehrke (Oct 02 2017 at 13:12):
To be clear, if you have de-identified data; you have a shadow of the data. ALL identifiers are different. not just the .subject element. There might be careful API that re-writes all these identifiers for specific use-cases... this is what I defined in my service blog https://healthcaresecprivacy.blogspot.com/2017/09/fhir-and-bulk-de-identification.html
Jose Costa Teixeira (Oct 02 2017 at 13:13):
so there could be a shadow of the patient resource, right?
John Moehrke (Oct 02 2017 at 13:15):
There would be TWO databases!!!
Jose Costa Teixeira (Oct 02 2017 at 13:17):
ok, then we have different patients and different resources. Of course the next question is: Is there a link between them?
Whether/How do you break the glass to get from DB2 to DB1. Or how you handle that updates and merges on DB1 would propagate to DB2.
John Moehrke (Oct 02 2017 at 13:18):
Again, you bring up break-glass...... are you using de-identified to TREAT patients? NOOOOOOO
Jose Costa Teixeira (Oct 02 2017 at 13:19):
i don't really understand the gap between identified and non-identified data. My question is how do you handle the sync between DB1 and DB2
John Moehrke (Oct 02 2017 at 13:19):
trusted third party
John Moehrke (Oct 02 2017 at 13:19):
like the service I defined in my blog article https://healthcaresecprivacy.blogspot.com/2017/09/fhir-and-bulk-de-identification.html
Jose Costa Teixeira (Oct 02 2017 at 13:20):
and the resource has what to allow that? an ID?
John Moehrke (Oct 02 2017 at 13:20):
a pseudo -- ID
John Moehrke (Oct 02 2017 at 13:20):
stands in for any real-ID
John Moehrke (Oct 02 2017 at 13:20):
where the trusted third party is the ONLY one that knows the cross-reference
Jose Costa Teixeira (Oct 02 2017 at 13:21):
so, de-identified patient 1 resource has an ID and someone knows that this ID actually matches the ID from "real" patient 1
John Moehrke (Oct 02 2017 at 13:21):
yes.. the trusted-third-party...
Jose Costa Teixeira (Oct 02 2017 at 13:22):
i am assuming these IDs are external IDs and nor resource IDs, but doesn't make a difference.
John Moehrke (Oct 02 2017 at 13:24):
well, resource IDs are identifiers and must be replaced with some pseudo identifier so-that they can't be used to identify the patient.
John Moehrke (Oct 02 2017 at 13:28):
Trying to design a complex system in zulip chat is not productive. The science of de-identification is deep, there are many standards and studies in the space. Far more failed projects that didn't do it right... But also methods that can help automate it. These methods will evolve. I laid out a systems design that looked at how to use Privacy-By-Design to create a privacy enforcing data lake of FHIR. https://healthcaresecprivacy.blogspot.com/2016/07/privacy-by-design-data-analytics.html
Jose Costa Teixeira (Oct 02 2017 at 13:30):
ok, i am aiming at a de-identified data lake.
Jose Costa Teixeira (Oct 02 2017 at 13:30):
(in terms of your article)
John Moehrke (Oct 02 2017 at 13:35):
They really don't yet exist... It was one thing I was working on at GE before they laid me off... I think there are some new efforts... But I will emphasis that this is a 'systems design' question, not a core FHIR model question...
Jose Costa Teixeira (Oct 02 2017 at 13:35):
my question was whether we'd have 2 patient resources, one in the operational data, another in the de-identified data lake.
Jose Costa Teixeira (Oct 02 2017 at 13:36):
and if so, where would the link be. Not by whom (don't care about the third party) but which identifier could be used.
John Moehrke (Oct 02 2017 at 13:36):
Again... systems-design question.... Logically you have TWO FULL databases... Neither knows of the other.
Jose Costa Teixeira (Oct 02 2017 at 13:37):
yep, but i don't care about the db design. I wondered if there would be an attribute in the resources for that.
Patrick Werner (Oct 02 2017 at 13:37):
Ola Jose,
you would have 2 different patient resources, the identifier of the de-identified resource would be the pseudonym.
Patrick Werner (Oct 02 2017 at 13:37):
the link will be your TTP, they have a list Pseudonym <-> real Identifier
John Moehrke (Oct 02 2017 at 13:38):
If I am in project ( A ), I better not be able to look at the identifiers (any form of identifier) or extensions in the data I am given and find anything that I can use to link to project ( B )
Jose Costa Teixeira (Oct 02 2017 at 13:38):
yes, patrick, that is what I am seeing. It is an external identifier, right?
Patrick Werner (Oct 02 2017 at 13:38):
its just the identifier of the resource
Jose Costa Teixeira (Oct 02 2017 at 13:39):
i would rather use patient.identifier, and not patient._id
John Moehrke (Oct 02 2017 at 13:39):
If I am in project ( A ), and talk to my friend who is running project ( B ).... we better not be able to identify the same patients
John Moehrke (Oct 02 2017 at 13:40):
Thus ALLLLLLL identifiers of any kind must be changed... Not just the Patient
Jose Costa Teixeira (Oct 02 2017 at 13:41):
I get it John, we can have different resources for all data.
Jose Costa Teixeira (Oct 02 2017 at 13:41):
since I wanted strong pseudonymization, i needed toask about the link back to the patient in case that would be needed.
John Moehrke (Oct 02 2017 at 13:45):
I think you are asking 'how does one re-identify the patient given a pseudonym?' This is an intereaction with the trusted-third-party, where you convince the trusted-third-party that you have justifiable need that is allowed under the terms (policy). Provided it is authorized, then you would be given the real Patient resource that is associated with that pseudomyn. You might be given bulk access to all the data for that patient, under specific conditions (policy).
Jose Costa Teixeira (Oct 02 2017 at 13:46):
i thnk using third-party is actually part of the design.
John Moehrke (Oct 02 2017 at 13:48):
Read any academic paper, and you will find that is the general solution.... There are more elegant designs of this trusted-third-party, that make it look less clunky, but it ultimately exists.
John Moehrke (Oct 02 2017 at 13:50):
I can't express all the designs on a zulip chat discussion...
John Moehrke (Oct 02 2017 at 13:53):
Good standards exist, there are companies that will execute these procedures, there are platforms that contain various tools... I have hope that we will define something for FHIR eventually. Like DICOM has achieved. But it takes a level of maturity of the standard, and maturity of the use-cases... https://healthcaresecprivacy.blogspot.com/p/topics.html#DEID
Patrick Werner (Oct 02 2017 at 14:16):
@Jose Costa Teixeira reading some academic papers about pseudonymization as proposed by John is a good idea. You can do so many details wrong. And there were a lot o famous cases where pseudonymization was broken due to its bad design.
It is not only about the Identifiers of all resources, its also about identifiying data fields which are part of the resource. This is not only the name or adress but could be potentially anything.
Patrick Werner (Oct 02 2017 at 14:16):
https://en.wikipedia.org/wiki/K-anonymity
is really important in that context
Jose Costa Teixeira (Oct 02 2017 at 14:18):
thanks. Actually this discussion has gone far away from my initial point - which, granted, was an open question.
Jose Costa Teixeira (Oct 02 2017 at 14:20):
point was: whether we have one patient in the core standard and everything else is out of scope, or if we can consider de-identified resources, and whether the standard handles that.
Jose Costa Teixeira (Oct 02 2017 at 14:21):
i don't think the answer to that should depend on the architecture of the datalake, or the eventual link being by a third party or not.
Jose Costa Teixeira (Oct 02 2017 at 14:21):
and whether that is used for break the glass or not, that is also another question. At least in my mind.
Jose Costa Teixeira (Oct 02 2017 at 14:24):
For GDPR at least, we have to deal with several attributes so that the target group is never less than 5 people
Jose Costa Teixeira (Oct 02 2017 at 14:25):
so yes, we also have to deal with that.
Jose Costa Teixeira (Oct 02 2017 at 14:25):
i was not after a de-identification design
John Moehrke (Oct 02 2017 at 14:27):
Well that gets to my initial question... What problem are you trying to solve...
Jose Costa Teixeira (Oct 02 2017 at 14:27):
a big problem, of which i was wondering if there was anyting in the standard
Jose Costa Teixeira (Oct 02 2017 at 14:28):
meaning, a problem that includes system design, and i did not expose, and a problem that touches many areas, and i could not expose as well.
John Moehrke (Oct 02 2017 at 14:29):
When you lead with "will solution X solve my problem" and you never expose what the "problem" is... all you are going to create is discussion churn...
Jose Costa Teixeira (Oct 02 2017 at 14:29):
i was focusing on whether our resources could support anonymization or ot, in any way
John Moehrke (Oct 02 2017 at 14:37):
Yes, the FHIR design methodology does support de-identification. It is not as mature as DICOM, but the essential parts are there. That which is missing is something that will take maturity of the standard to happen first.
John Moehrke (Oct 02 2017 at 14:39):
note, your thread has inspired the security wg to add narrative.. which I will be applying to the current build soon. See GF#10581
Elliot Silver (Oct 02 2017 at 16:29):
@John Moehrke On issue that this discussion brings up, and which I don't see in the tracker item, is the recognition that the resource ids (not identifiers) are identifiable information. Obscuring/subsetting/redacting data in the resources is useless if the same resource ids are maintained.
John Moehrke (Oct 02 2017 at 16:38):
That is a factual statement. Where do you think it should be said?
Elliot Silver (Oct 02 2017 at 16:50):
Yes, it is a statement of fact, but one that could be overlooked. From the tracker:
Discussion: With the Observation Resource, one would remove the subject element as it is a Direct Identifier. However there are many other Reference elements that can easily be used to navigate back to the Subject; e.g., Observation.context value of Encounter or EpisodeOfCare; or Observation.performer. Further, if the client has the Observation id, by presenting different credentials they might be able get a different view on the same resource.
Elliot Silver (Oct 02 2017 at 16:51):
Hmm, don't like it, and it doesn't quite cover the issue.
John Moehrke (Oct 02 2017 at 17:06):
that CR is just adding a hint at the S&P Module page. It is careful to not indicate that it is comprehensive... thus I am not sure that this is the time or place to get that detailed.
Grahame Grieve (Oct 03 2017 at 02:33):
whether we have one patient in the core standard and everything else is out of scope, or if we can consider de-identified resources, and whether the standard handles that
Grahame Grieve (Oct 03 2017 at 02:33):
I think I lost track of this because I don't understand this bit.
Jose Costa Teixeira (Oct 03 2017 at 19:10):
@Grahame Grieve it was about whether there was any guidance/preference/explicit support for any de-identification approach: iay we need to de-identify Patient 1's data, does this mean that a) the de-identification would be completely out of the standard, or if b) implementers could have a new patient called Anonymous1 and work with that, and use identifiers to make the link to the original patient (in whatever case that would be needed), and if any of these things would trigger any possible changes
Grahame Grieve (Oct 03 2017 at 20:10):
ok. well, there are multiple ways to implement it, but I don't think that any of them need support from the spec. The bulk data interface is one placewhere it would be pretty useful to add deidentify=true as a parameter, on the basis that the server could de-identify but maintain longitudinal coherence from query to query, but as John has pointed out, doing de-identification is really hard
John Moehrke (Oct 03 2017 at 20:48):
Jose, a FHIR Server, as a server of data in FHIR format, could be holding some patient data (Patient, Observation, Medicat...) that are fully clinical quality and identifiable; and also other data that are de-identified; and other data that is purely fabricated as test data. Is this what you are asking? Are you asking if this is possible? If so, then yes it is. This is the topic of a Security WG GF#10580 that we would welcome your input to. This is a request to explain 'at least one' way this can be done.
Jose Costa Teixeira (Oct 03 2017 at 20:51):
No, i just replied to explain my previous question. My initial purpose was to see whether among all these options (which are clear) there would be something in the standard to support the actual implementation.
Jose Costa Teixeira (Oct 03 2017 at 20:53):
I do have to consider test data in light of a GDPR implementation and will contribute. But for now I am looking at de-identification for large bits of data.
Jose Costa Teixeira (Oct 03 2017 at 20:56):
I don't know if the topic was in this chat stream, but IMO, data that is de-identified to a point that you cannot re-identify it alone, is effectively anonymized, right? In other words, if you de-identify a patient and give the identification key to someone else, then for you, that patient is anonymized, right?
John Moehrke (Dec 12 2017 at 22:40):
The security wg has added a bit of a section on De-Identification / Pseudonymization / anonymization. http://build.fhir.org/secpriv-module.html#deId
Grahame Grieve (Dec 12 2017 at 23:06):
that section should mention text narrative and documents
John Moehrke (Dec 12 2017 at 23:14):
good catch. I don' know how I missed that. It is however in ALL of the further standards.. :-) Can you enter a CR ?
Grahame Grieve (Dec 12 2017 at 23:19):
Alexander Henket (May 13 2018 at 07:52):
Very small followup in this older thread. Suppose I want to send an anonymized Procedure, which has a Procedure.subject 1..1. What do I do? Would this do given that "At least one of reference, identifier and display SHALL be present (unless an extension is provided)."
<subject> <extension url="http://hl7.org/fhir/StructureDefinition/iso21090-nullFlavor"> <valueCode value="MSK"/> </extension> </subject>
Or would it be better to include <display/> just to make sure you don't trip systems?
<subject> <extension url="http://hl7.org/fhir/StructureDefinition/iso21090-nullFlavor"> <valueCode value="MSK"/> </extension> <display value="masked"/> </subject>
Lloyd McKenzie (May 13 2018 at 08:20):
The second is better because it provides both a computable representation for those who understand it and something useful for systems that don't recognize the extension.
John Moehrke (May 13 2018 at 16:34):
This is fine if you are masking... If however you are using pseudonymization, then you could put the pseudo identifier in. And since this is a Resource you can put it in as just the identifier. The MSK should also appear in the resource meta to indicate it is not whole.
Vikas Mittal (Jul 23 2019 at 06:11):
Created in wrong spot.
Raised new topic here - https://chat.fhir.org/#narrow/stream/179252-IG-creation/topic/Constraints.20on.20Reference/near/171493449
Last updated: Apr 12 2022 at 19:14 UTC