Stream: implementers
Topic: How do we enforce "need to know" while supporting contextles
Lloyd McKenzie (Oct 15 2018 at 19:04):
One of the attractive things about FHIR REST and layers like SMART and CDS Hooks is that EHRs can just "expose their data" and systems can query what they need. As the needs change, there isn't necesarily a requirement to revamp the EHR interface.
However, in "secondary use" situations for research, public health and claims, there's often policy and/or regulatory reasons to restrict the data shared to the "minimum necessary" to meet a particular objective. The requirement to expose only the minimum necessary is imposed on the data custodians (e.g. EHRs). Traditionally, this was managed through custom interfaces where the specific data elements allowed for a particular secondary use recipient in a particular context were hard-coded into that purpose-specific interface. However, as we move into general purpose interfaces (which provides lots of cost, efficiency and maintainability benefits), the question is how to continue to enforce the limitations so that only "needed" data is provided.
RESTful search and CDSHooks don't really provide context around why information is being shared or how it's going to be used. It's not clear that OAuth allows establishing that context either. How do EHRs currently expect to meet their regulatory requirements that limit what data can be exposed in the FHIR space?
Kevin Shekleton (Oct 18 2018 at 16:09):
This topic has been raised before, most recently by @Kensaku Kawamoto as a ballot comment to CDS Hooks. As this was broader than CDS Hooks, we referred to this to the security WG.
I've had several conversations over the past couple of years regarding "minimum necessary" and data shared via FHIR (note -- this isn't just a SMART/CDS Hooks concern). On one end of the spectrum, I've heard arguments that the data shared should be literally the "minimum necessary" which would certainly entail custom profiles for every client to tailor the data to just want they need and no more, in addition to a very fine grained authorization model.
I am not a policy expert in any of this to know what we need to build. We need clarity on that but regardless, our current approach is likely too coarse grained but that a very fine grained approach is not feasible, leaving us somewhere in the middle.
Lloyd McKenzie (Oct 18 2018 at 21:57):
It's not even a question of "profile per client" but "profile per client per business context". The data needed when performing function X may be different than when performing function Y. So it's more comples than just knowing who's asking for the data - you also need to know why they're asking/what they're planning to do. There's also the issue of how to manage changes to the minimum necessary as the client's needs change or the server's capabilities evolve. One possibility would be if the queries could indicate a profile the results are expected to comply with that could indicate what data should be included. The sender could then trim on the basis of that profile. However that still leaves the requester in the position of determining what data they want - and some data filtering could be difficult to express using profiles.
Lloyd McKenzie (Oct 18 2018 at 21:58):
Definitely agree that this isn't a CDS-Hooks specific issue.
Grahame Grieve (Oct 18 2018 at 22:42):
I think that in the new world, the API specs challenge the notion of a trusted boundary. In the old world, we divided things neatly into 'completely trusted' and 'not trusted at all'
Grahame Grieve (Oct 18 2018 at 22:43):
now we live in a world where there's multiple useful points on the trust/cost tension
Grahame Grieve (Oct 18 2018 at 22:43):
we haven't dealt with that directly - we assume that it will be handled but haven't tried to make the handling of it interoperable. This might be something we want to take on for R5
Lloyd McKenzie (Oct 18 2018 at 22:48):
It's not even necessarily a question of "trust". The requirements about what may / can't be shared can be defined by policies and regulations that don't necessarily fit neatly into the way FHIR sharing typically works. That's orthogonal to the trust between the parties.
Parnesh Raniga (Oct 19 2018 at 05:33):
Very interesting discussion. Is there a working group on this?
René Spronk (Oct 19 2018 at 07:08):
We also embarked upon this discussion in the v3-days when discussing SMIRFS (roughly equivalent to a FHIR resource, a predefined grouping of a v3 classes) and the references between them - but at the time we weren't focused on 'need to know', but on 'patient safety'. If you query for "condition X and all supporting resources related thereto", how do you carve out the "relevant" chunk out of a cloud of resources? Do you leave the decision up to the querying party (allow them to specify a profile), or to the server (using a contextual profile, or in future: using AI to determine relevance to avoid data overload), or to the originator/creator of the data which is to be returned ?
And no, we didn't end up with a conclusion. This was just prior to the creation of FHIR. http://wiki.hl7.org/index.php?title=Safe_querying_of_a_RIM-based_data_model - related, but not the same as the topic being raised on this thread.
Lloyd McKenzie (Oct 19 2018 at 14:58):
Kevin indicated that the specific CDS Hooks issue was passed on to the Security WG. @John Moehrke Where is the WG at with discussion of this issue?
John Moehrke (Oct 19 2018 at 20:15):
I don'
John Moehrke (Oct 19 2018 at 20:16):
I don't recall any specifics like this being brought to the Security WG. However it is not uncommon for us to worry about this. We have provided the security labeling, services to do the security labeling, services to do authorization decisions, and authentication. These services need policy, which is not a role that HL7 takes on.
John Moehrke (Oct 19 2018 at 20:19):
What I am worried about is that typically in REST each instance of a Resource is either blocked totally, or allowed totally... exception being when de-identification is applied. The REST model works well when the Resources are designed with reasonably anticipated policies in mind. I think we have some Resources that are nicely designed for the primary clinician, but not for all reasonably anticipated policies. Designing for the primary clinician is good for that clinician, but creates very large Resource definitions that are in appropriate when a user that should only see a portion needs access.
John Moehrke (Oct 19 2018 at 20:20):
Thus I am very happy to see some of these concerns come up, and R5 would be a good time to think through some of this. Especially important as Resources approach higher FMM, less important when FMM is zero.
Lloyd McKenzie (Oct 19 2018 at 20:27):
There's three pieces to this:
1. How does a client indicate to a server what sort of explicit filtering it should do (in the situation where the client decides what elements it gets). We could use _elements for this, but that can get clunky fast if you're wanting more than a few elements
2. How does a client indicate the "context/purpose" for which they're accessing information (in the situation where the server decides what elements are provided, but it needs context to determine that). This mechanism can be relevant for other things like logging too
3. How does the server redact and/or anonymize the information (including potentially mandatory elements) and indicate that this has happened
We have pieces of this, but we don't have a comprehensive solution.
John Moehrke (Oct 19 2018 at 20:52):
1-2.It is unusual for a client to indicate this. A client indicates their Identity and role/purposeOfUse. This is part of SMART-on-FHIR, and IUA, and HEART. That drives policy decisions that are either in one of the OAuth cascades, or executed in business logic in the Resource Server.
John Moehrke (Oct 19 2018 at 20:52):
3. Redaction is not a very useful method.. but if chosen, we have provided guidance on the security pages.
John Moehrke (Oct 19 2018 at 20:58):
In an OAuth environment what you are looking for is expressed in the scopes. In other security environments, such as SAML/XACML it is done differently.
I think what you are looking for is a security/privacy Implementation Guide. We have SMART-on-FHIR. As other use-cases are brought before us, we will have others. We can't predict what the needs will be. Well, we could but we would end up just pointing at OAuth like we do today.
Lloyd McKenzie (Oct 19 2018 at 21:20):
The scenarios we're looking at get down to what elements are present in a particular resource, so redaction seems like the only way. In order for the redaction to occur, the server needs to know what to redact - and that means it needs to either be told that explicitly or it needs the context in which the access is occurring so it can apply the appropriate business rules.
John Moehrke (Oct 19 2018 at 21:33):
That is the conclusion you come to. I have tried to express how more thought going into the design of the Resources would make them more capable of surviving reasonable policy...
John Moehrke (Oct 19 2018 at 21:34):
I have expressed how client can express identiy, role, and purposeOfUse... That is the input to an access control decision, the output of that decision is enforced.
John Moehrke (Oct 19 2018 at 21:35):
continuing to insist that the only solution is to redact, means that the only soluiton will be redact.
John Moehrke (Oct 19 2018 at 21:39):
what we need is for some reasonable stated policies to come forward. From those policies we will see if there is sufficient role vocabulary, purposeOfUse vocabulary, and the like for context definition. From those policies we will see where there is a problem. Where that problem exists we might adjust the Resource boundaries so that the policies can be efficiently enforced. Lacking stated policies, all I can state is that a solution can be designed.
Lloyd McKenzie (Oct 19 2018 at 21:51):
I'm not really talking about "reasonable" policy here. I'm talking about a situation where the payor needs to know your name, address, birth date and gender, but they're not allowed to know your phone number, marital status or next of kin. That problem can't be solved by splitting resources - it has to be handled by filtering the data exposed within a resource. The policy is simply "no more data than necessary to execute the function". That's a relatively sound policy, but it's not going to be managed by resource boundaries.
Grahame Grieve (Oct 19 2018 at 22:27):
more thought going into the design of the Resources would make them more capable of surviving reasonable policy
ok. where's a specific example?
John Moehrke (Oct 22 2018 at 13:52):
We do have guidance in the security pages on redaction. Can you review and describe specific additional questions you have? http://build.fhir.org/secpriv-module.html#deId
John Moehrke (Oct 22 2018 at 13:56):
I don't think this is a case of us defining a policy that enforces... but if you need to redact elements, then we do provide guidance to tag that result as being REDACTED.. much like we have on _summary and _elements with recommendation for SUBSETTED. Where SUBSETTED is used to indicate the client requested a subset, this use-case is REDACTED because rules removed (redacted) elements out.
John Moehrke (Oct 22 2018 at 14:04):
A big difference is that with SUBSETTED it is clear the client knows they have asked for a subset... with REDACTED the client doesn't know they are not getting the whole Resource, and this brings in risks. Risks that the lack of elements might lead to misunderstandings that the values don't exist. Risks that they should not be told that they have a redacted form. etc... So the access control decision needs to include policy decision to tag the results as REDACTED or silently redact without tagging. Thus why I am hessitant to define a policy that says you MUST mark the data as REDACTED. Hence why this ends up at a unsatisfying... It is up to Policy...
Lloyd McKenzie (Oct 22 2018 at 15:33):
The issue here is more on the other side - how does a system know what they need to subset/redact? It's pretty common for permissions to data access to be driven by what's going on. Public health might not have the right to see identifiable data "in general" but might have the authorization to see identifiable data if they're tracking someone involved in an outbreak. When the public health system queries the EHR, how do they communicate which context they're asking for the data (so the EHR knows whether to redact/subset)?
John Moehrke (Oct 22 2018 at 15:57):
That is communicated by their context assertion of role and purposeOfUse. This is already in the OAuth token request. That gets interpreted by the Access Control decision into enforcement actions. The client does not define the enforcement actions. Else every bad-actor would dictate that no enforcement be done...
Grahame Grieve (Oct 22 2018 at 18:58):
I conclude, then, that there's no specific example. I found it an interesting assertion because my view is that generally, it's the document orientated resources where this si most likely to apply. Or maybe EoB which is too big anyway
Lloyd McKenzie (Oct 22 2018 at 19:26):
Specific examples are in the payer space. They don't want the fully populated resources. They want de-identified information that contains only what they need to answer the EHR question. Different payers, different types of orders will require different subsets of data to be exposed. The rules can differ depending on the type of coverage the patient has. However, I think John's answer probably meets the need.
John Moehrke (Oct 22 2018 at 19:49):
I simply wish the most common, non-Treatment, use-cases would guide the sizing of our Resources during STU. Redaction should be a non-common solution, not something that is commonly done.
Lloyd McKenzie (Oct 22 2018 at 19:59):
In the "only what's needed" use-case, redaction is the only option. You'd need to remove all data not strictly necessary - so stripping comments, stripping author ship, stripping cuff size from a blood pressure, etc. Any data not strictly needed for claim evaluation would need to be removed. That's not a boundary issue.
Parnesh Raniga (Oct 23 2018 at 00:28):
Another use case is the secondary use of data for clinical research (epidemiology and public health). Here a researcher can ask for a subset of data ( perhaps conforming to a predefined criteria). The returned data should not be identifiable (accept in the case of a outbreak perhaps as mentioned by @Lloyd McKenzie ).
Last updated: Apr 12 2022 at 19:14 UTC