Stream: implementers
Topic: Implementing search in heterogenic environment
Mattias Flodin (Dec 28 2017 at 16:47):
I find generic search pretty difficult to implement. Behind my FHIR server I have a REST API that lets me query for objects based on certain filters. My general plan for dealing with an incoming query has been to parse it into an expression tree to represent the query, then find subqueries in the tree that I can make requests for from the REST API. For anything that can't be handled in the REST API, I perform the final joins / filtering in memory. But here's where it gets messy. A chained parameter is essentially a join that imposes a restriction on what referenced objects will be used. So, the global expression has an effect on how the local part of the expression should be executed.
There's probably a lot written about this stuff in database theory but it's not something I ever deep dived into. Does anyone know some good reading material to help me out here? Or is there code / libraries that will help me perform this?
Christiaan Knaap (Dec 28 2017 at 19:25):
This is a problem we aim to help solving with Vonk FHIR Facade. It breaks down the search into the elementary units, where you get to fill in the details of actually filtering and retrieving your own data (of which you know the structure / API). Breaking down the search and then combining the results is indeed hard, especially if you also want to validate it (valid search parameters for the resourcetype(s) currently in focus, valid (type) modifiers, comparators and values for the type of search parameter etc.) and want to return meaningful OperationOutcomes if something is not correct.
While this can help to make the search work you need more to make it efficient. I don't know of any existing reading on this, but from my experience I can come up with two broad ways of dealing with this:
1. Gain knowledge / statistics about which filters are the most discriminating. Perform those first and use the results for a 'join' on the next REST call. This way you minimize the amount of data you have to retrieve through each call. (This is actually what a relational database does when computing the 'query plan'.)
2. Create a cache of the data 'behind' the REST API, and query that. But cache means cache-invalidation and the complexity of that depends on the data turnover in the source system and the delay that is allowed on the FHIR interface.
Regardless of the approach I think there is no really easy solution to this problem.
Mattias Flodin (Dec 29 2017 at 09:21):
Hm, a[b eq C1].c eq C2
should be equivalent to a.b eq C1 and a.c eq C2
right? Is there any situation when a filter in the parameter path can't be factored out to remove all filters from the path?
Mattias Flodin (Dec 29 2017 at 10:13):
Seems that if you just use the surrounding path prefix to qualify each parameter path within the nested filter, then you can move it out of the path. So (PREFIX[expr])
becomes (PREFIX and expr2)
, where expr2 is expr with every parameter path prefixed by PREFIX.
Christiaan Knaap (Dec 29 2017 at 13:15):
No time to analyze now, but I'd especially check the validity of your assumption on composite searchparameters and searchparameters that can have multiple values (like Patient.name).
nicola (RIO/SS) (Apr 02 2018 at 05:29):
@Christiaan Knaap Hi Christiaan, would you like to participate in FHIR storage track? We want to try it in Germany
Grahame Grieve (Apr 02 2018 at 05:32):
what exactly is the track?
Grahame Grieve (Apr 02 2018 at 05:33):
I mean, what do you do to partcipate?
nicola (RIO/SS) (Apr 02 2018 at 05:33):
Just preparing it - hope we are not too late - https://github.com/fhir-fuel/fhir-storage-and-analytics-track
Grahame Grieve (Apr 02 2018 at 05:34):
so saw that but it doesn't tell me what i have to do to be part of it
nicola (RIO/SS) (Apr 02 2018 at 05:35):
i am working on it. Should i use template?
nicola (RIO/SS) (Apr 02 2018 at 05:36):
I want to find somebody, who help me to do it in a right way :)
Grahame Grieve (Apr 02 2018 at 05:38):
well, we can get to the template part later. What isn't clear is how it's about 'connecting'... at the moment it looks sort of like an unconference presentation
nicola (RIO/SS) (Apr 02 2018 at 05:42):
Yes, that's true. Do you think this is not a track? I like discussions during the tracks and missing such about storage implementation - how to organise it?
nicola (RIO/SS) (Apr 02 2018 at 05:43):
I think, the result can be the report or guidelines for implementers
Grahame Grieve (Apr 02 2018 at 05:49):
we've had tracks like this before.. but most people come to make something work ('to connect'). So If you get interest we can do it. But it does need to be clear what your anticipated outcomes are
nicola (RIO/SS) (Apr 02 2018 at 05:52):
Can you point me to similar tracks in order to use them as inspiration?
nicola (RIO/SS) (Apr 02 2018 at 05:58):
Can Outcome be working prototype of database schema and FHIR search implementation? Or experience with existing approaches to do it?
Christiaan Knaap (Apr 03 2018 at 08:02):
@nicola (RIO/SS) Certainly interesting, thanks. Only depends on the other tracks - cannot participate in three at the same time. Is there already any outline for the track or do you want me to think about that as well?
Grahame Grieve (Apr 03 2018 at 23:27):
so back to this... I don't know whether there's enough interest in a standard database schema that would both support the API, and also enable interoperable queries. I don't know if this is something that could be part of this, and would be the interoperability bit
Last updated: Apr 12 2022 at 19:14 UTC