Stream: europe
Topic: data minimization
Thomas Tveit Rosenlund (Dec 03 2020 at 09:24):
After reading the FHIR-GDPR confluence page I am left with a feeling that the subject of data minimization is not properly discussed. We have a hard time implementing resonable data minimization features using available functionality in the FHIR RESTful API specification. Our main approach is to use the _element parameter but this only goes this far, and can not distinguish two extensions of a profile, you can either have all extensions returned or none. This makes the data minimization features of our solution questionable at best.
Earlier we got the recommendation to implement GraphQL to solve this but I feel that this is somewhat of a cumbersome solution and makes our API into another thing entirely. Have anyone else implemented a good set of data minimization features in their API's? Should we propose some extended functionality to the FHIR API specification to solve this problem (the FHIR RESTful API is considered normative)?
Jose Costa Teixeira (Dec 03 2020 at 10:35):
We're working on a few things that should be related to this. One is the ability to express, in a FHIR computable way, what the server can or cannot do.
The other is to actually make it work.
Jose Costa Teixeira (Dec 03 2020 at 10:37):
I'm documenting some use cases like propagation of permissions across a team, patient's insight onto data sharing. My idea is to have better coverage of GDPR in FHIR (by which I mean - I agree we're not there yet)
Jose Costa Teixeira (Dec 03 2020 at 10:37):
I agree using graphQL opens a new range of problems.
René Spronk (Dec 03 2020 at 14:52):
Servers always have the option to minimize data, based on the context of a request. This could be implicit, e.g. based on a system ID or OAuth, or it could be explicit, e.g. if one were to include a "_profile=" parameter in a search to define the maximum extent of the data which should be returned by a server. That may be a rather resource intensive approach however.
Thomas Tveit Rosenlund (Dec 03 2020 at 18:19):
Thanks for the feedback. Good to know we are not alone in experiencing this.
A different profile for each set of allowed information content seems like dead end to me. The _elements parameter is much better suited in my opinion because it can actually work for most consumers and cover a lot of different use-cases. If only one could express what extensions to include as well we would be much closer to a workable solution for flexible data minimization. I do agree that the server can minimize data based on the authorization and other security settings, but this does not really matter for the "data minimiztion by default" requirement.
Jose Costa Teixeira (Dec 03 2020 at 19:02):
Some thoughs on how I think this should work:
- Not based on search query (_elements) alone - it's the server that needs to determine what can be returned, right?
Jose Costa Teixeira (Dec 03 2020 at 19:03):
- My personal vision is a set of Permission resources associated with profiles, so that each server knows whic data can be accessed by whom. The server receives a query, mashes all the Permissions and decides what can be sent for each specific query. Think of Permission as a way to say
a) "in this resource, this element is sensitive, this one is not"
and
b) "a person wtih this role can access this data set for this purpose. For another purpose, the dataset is different"
Jens Villadsen (Dec 03 2020 at 23:07):
@Thomas Tveit Rosenlund we do this, and determine runtime, serverside (as mentioned by @Jose Costa Teixeira ) to what extent information should be removed based on your current security token. We also use _elements (mostly internally between FHIR systems) other places to minimize the actual data that is handed over from one proces to another. The problem with relying on the _elements parameter is that you loose control. What data you expose from your server should only be guarded by your server - not clientside behavior.
Thomas Tveit Rosenlund (Dec 04 2020 at 06:12):
I believe Data minimization can be regarded in (at least) to use-cases.
- The server exposes some information based on the client system level of authorization
- The server makes it possible for the client to configure what information it needs in a specific case(_elements parameter)
The first use case is a standard authorization use-case to ensure that no client user or system can perform data operations the client or user is not authorized to actually perform. I believe this use case can be covered the way @Jose Costa Teixeira describes and is a server side operation based on client/user level of authorization.
The other use case should also be addressed according to GDPR. The way I interpret our lawyers working on GDPR interpretation, there is a separate requirement in the GDPR for servers to implement functionality that makes it possible for client systems to only collect the data they need in a specific case. This part of the GPDR requirements is not really connected to the authorization of the client/user or the sensitivity of the data, but only connected to a client side evaluation of what the system need to cover the use-cases at hand. If the system asks for information it is not authorized to read this should not be returned according to the authorization of course.
We believe the second use-case calls for another level of functionality and is difficult to solve using profiles based on level of authorization because the server can never know what the client system or user actually needs, and there will be a large number of combinations that is impractical to express using profiles.
Last updated: Apr 12 2022 at 19:14 UTC