FHIR Chat · PII: Personally Identifying Information · implementers

Stream: implementers

Topic: PII: Personally Identifying Information


view this post on Zulip Grahame Grieve (Jan 05 2017 at 22:42):

Just reading a post-mortem of an information leak, and found this policy:

  • Regular code audits to identify any places where PII is stored and / or transmitted, and regular review of the necessity of each instance found. If there's a chunk of code that shows you your email address on a route that's no longer used after other changes, get rid of it.
  • Identification of PII in the code base and database, so developers immediately know if the code they're working with stores or transmits PII and precisely the kind of information that needs to be considered.
  • Ensuring that the definition of personally-identifiable information is disambiguated entirely, so that there's no question or subjective interpretation of what should be treated differently at all.

view this post on Zulip Grahame Grieve (Jan 05 2017 at 22:47):

it made me wonder about FHIR. Take the following statement:

  • PII found in healthcare records is some of the most reliable available and has a good black market value
  • in the FHIR specification, all information that can directly identify individuals is found in the following resources: Patient, RelatedPerson, Practitioner, Person.
  • the personally identifying information can be found both in the resources, and additionally in HTTP logs of FHIR API access
  • Note that for most practitioners, some identifying information is publicly available through national practitioner registries
  • Though the other resources do not contain information that directly identifies an individual, the identify can be determined by matching the information with other databases (including some publically available) and generally needs to be treated with the same care
  • De-identification of healthcare data is difficult and prone to causing data leaks or data quality issues

view this post on Zulip Grahame Grieve (Jan 05 2017 at 22:47):

do people agree with that statement? would it be useful in the spec somewhere?

view this post on Zulip Richard Townley-O'Neill (Jan 05 2017 at 23:51):

Yes

view this post on Zulip Lloyd McKenzie (Jan 05 2017 at 23:56):

For the second bullet, is it worth calling out " - including when these are 'contained' within other resources". As well, this information may show up in narratives, extensions and/or Reference.display elsewhere.

view this post on Zulip Grahame Grieve (Jan 05 2017 at 23:58):

yes worth calling out both of those things

view this post on Zulip Lloyd McKenzie (Jan 05 2017 at 23:58):

Would there be utility in an explicit ability to flag data elements as containing PHI? It should be possible to do this in some cases at the international level and tighten it further in IGs. (Not sure if we should treat Patient information as PHI if it's about a cow or a chicken?)

view this post on Zulip Lloyd McKenzie (Jan 06 2017 at 00:00):

In fact, we could categorize elements as both PII and PHI, with values of "always", "never" and "potential" - with potential items being constrainable to "always" or "never" in IGs based on the element use.

view this post on Zulip Grahame Grieve (Jan 06 2017 at 00:02):

I think we do it by resource - one of the name resources, a resource in one their compartments, and other resources

view this post on Zulip John Moehrke (Jan 06 2017 at 12:53):

Grahame, We do express these things on the Security page. I welcome improvements.

view this post on Zulip John Moehrke (Jan 06 2017 at 12:55):

PII (there are various acronyms) -- They are all defined as Identifiers bound with Information. If you break the bind between information and identifiers then you just have data and identifiers. This is fundimentally what De-Identification does, it breaks the binding. Of course De-Identification is a process to reduce the risk to an a project defined level, for a project defined purpose, and a project defined environmnet. So the only automatic algorithm for de-identification is | /dev/null

view this post on Zulip John Moehrke (Jan 06 2017 at 13:02):

Adding to that... note that many elements found in the 'other' resources carry enough information to uniquely identify patients. These are the "Indirect" or "Quasi" identifiers that De-Identification processes will warn you about. This is just a fact of medical data, it is descriptive about the subject. Thus de-identification as a process to reduce risk, one that never gets to zero (exception /dev/null)

view this post on Zulip John Moehrke (Jan 06 2017 at 13:02):

This is also why I have tried, however feeble, to get Patient, Practitioner, Person, and RelatedPerson to be thinner.. that is more focused on identification, and less 'information'. It is true that one can remove elements from within a Resource, but it is much easier to kill the Reference to break the bind between identifier and data. --- On this topic, I think these Resources are good. ---- Putting a "Security Considerations" in the four Resources might be helpful to our reader.

view this post on Zulip John Moehrke (Jan 06 2017 at 13:03):

article on the topic https://healthcaresecprivacy.blogspot.com/2015/02/is-it-really-possible-to-anonymize-data.html

view this post on Zulip John Moehrke (Jan 06 2017 at 13:07):

We might find it useful, and I expected the RDF efforts to help tag each element within each Resource in a way that helps with automated De-Identification. The first step of De-Identification is to create a total inventory of all elements, then classify those elements as : Direct-Identifier, Quasi-Identifier, and Data. This is the first step, as one then knows they must remove (or change) the Direct-Identifiers; and they must address the risk presented by the Quasi-Identifiers.

view this post on Zulip John Moehrke (Jan 06 2017 at 13:10):

BUT we must all recognize that, medical data is about the subject, so medical data is almost always Quasi-Identifier. If it was not, then it would not be useful medical data... Somewhat self defining for healthcare.... SO, Quasi-Identifiers are on a spectrum of risk. For example body weight is mostly just data, unless it is on the extreme of the spectrum then it quickly approaches being an identifier.

view this post on Zulip John Moehrke (Jan 06 2017 at 13:13):

@Lloyd McKenzie We have tags for data that has been de-identified... this because of the above fact that medical data is by definition always about the subject. Hence why _confidentialiity "N" is actually on a IT security scale, way up in 'highly sensitive' space... We start in the highly sensitive security space, and go higher or lower based on the data, the patient, the situation, etc.

view this post on Zulip John Moehrke (Jan 06 2017 at 14:32):

Just realized that my words might seem to be discouraging of the original proposal to update the FHIR specification. I didn't mean to discourage improvement. I just get overly worried about specification text that is trying simply to inform and warn, becoming perceived and used as if it was normative. Security and Privacy are hard... (as is safety, treatment, etc)...

view this post on Zulip Grahame Grieve (Jan 06 2017 at 20:44):

well, when i look at the security page, I don't see it directly addressing this, and my text proposed above would go to the security page. And I agree that it would be useful to reference the note from the 4 identifying resources

view this post on Zulip Grahame Grieve (Jan 06 2017 at 20:44):

I don't think that any of your comments invalidated the original idea, though they did suggest word-smithing improvements to it

view this post on Zulip Grahame Grieve (Jan 06 2017 at 20:46):

I'm skeptical that there's value to defined in classifying the elements in regard to "Direct-Identifier, Quasi-Identifier, and Data". I think it's more useful to classify the resources. But if you wanted to go and do that, we certainly can do it, and RDF is the way that this stuff would manifest best (though it would find it's way into the structure definitions too)

view this post on Zulip Mike Robert (Jan 06 2017 at 20:56):

Practitioner does have information that identifies the provider, but since it is not linked to any health information about them as a patient, is it really PII?

view this post on Zulip Jenni Syed (Jan 06 2017 at 21:08):

PII isn't necessarily healthcare specific. It's any identifying info.

view this post on Zulip Jenni Syed (Jan 06 2017 at 21:08):

IE: if I had your birthdate and mother's maiden name, I could probably do some wicked social engineering hacks and steal things

view this post on Zulip Jenni Syed (Jan 06 2017 at 21:09):

The healthcare data (PHI) could be used for other nefarious plans, but that may or may not be the target

view this post on Zulip Mike Robert (Jan 06 2017 at 21:28):

Good points, thanks.


Last updated: Apr 12 2022 at 19:14 UTC