Stream: implementers
Topic: Aggregating multiple responses (bundle) in one bundle
Dheeraj Kumar Pal (Dec 09 2020 at 18:33):
Hi There,
We are receiving bundle in response from multiple sources and are trying to aggregate all the responses in one bundle and because we have to send out this aggregated response to another server. We are re-writing the full and referenced URL with our own server URL. but we are not finding the elements in bundle where we can specify the original source (name or id or code) so that receiving service have that information and send us back while resolving those URL for us to identify where to re-route the message?
Grahame Grieve (Dec 09 2020 at 19:30):
you need to manage this on the server, since the client doesn't know anything about this.
Grahame Grieve (Dec 09 2020 at 19:30):
from my devdays presentation about this:
Grahame Grieve (Dec 09 2020 at 19:32):
- Prefix ids with a server id e.g. s1-[id] (64 chars!)
- Reidentify every resource with random id & Maintain look up table
- use GUIDs & send to all backend servers
Grahame Grieve (Dec 09 2020 at 19:32):
those are the options...
Dheeraj Kumar Pal (Dec 09 2020 at 20:13):
Thank you @Grahame Grieve !! for quick response. We have currently implemented first option i.e. prefixing the server information in bundle id. One quick question
Receiver of the response have to use this Server name and send us back in JWT token in case they would like to resolve the URL. (We are including this information in our IG ) . Is this ok and align with FHIR server implementation or we need other approach ?
Grahame Grieve (Dec 09 2020 at 20:19):
I don't understand that question. The client doesn't know anything about any of this. it just has a bundle with references that it tries to resolve
Dheeraj Kumar Pal (Dec 09 2020 at 20:32):
We are building a Central service where multiple IHEs are connected to exchange the fhir based clinical data.
If one of these IHE query us for getting the resources from other IHEs. We are pulling the resources from all other IHEs that are connected to us and aggregating the responses (requesting IHE doesn't know anything about other IHE) and sending back to requester. Let's assume, if there is 4 condition from 4 different sources , central service will re-write (over write) the source URL with it's own and send the response to the requester.
Now, if requestor is picking up full URL of one condition and requesting back to us... How should we now figure out where to send this request as request itself doesn't contain the target system. To resolve this scenario, we are sending the target (source from where we received the response bundle) by appending in ID of the bundle and expecting client to send us back the same in JWT token payload, which we use to identify that where to redirect the request. Is this valid implementation?
Michele Mottini (Dec 09 2020 at 20:36):
If you prefixed the ids with the server you will receive a request with the prefixed id - and you know where to send it from the prefix. No need to muck around with JWTs or anything like that
René Spronk (Dec 10 2020 at 07:37):
Another question is : why a bundle of bundles? It could just be a flat list of all resources returned by any back end systems, with OperationOutcomes for those systems that can't be reached or have some other sort of issue to report. What's the additional value of a 'bundle of bundles' ?
Dheeraj Kumar Pal (Dec 21 2020 at 18:05):
René Spronk said:
Another question is : why a bundle of bundles? It could just be a flat list of all resources returned by any back end systems, with OperationOutcomes for those systems that can't be reached or have some other sort of issue to report. What's the additional value of a 'bundle of bundles' ?
We are aggregating all the resources (returned in response) in one bundle and not using bundle of bundle.
Dheeraj Kumar Pal (Dec 21 2020 at 18:07):
Handle the link returned from multiple sources while aggregating the response : Should we include the link from each responding FHIR server (off-course after rebranding it)? or is there a different way to handle the link?
René Spronk (Dec 21 2020 at 18:25):
Link as-in: self link, next link?
Dheeraj Kumar Pal (Dec 22 2020 at 09:10):
René Spronk said:
Link as-in: self link, next link?
Yes, previous, self, next link
René Spronk (Dec 22 2020 at 11:37):
In the aggregate bundle, you'll have to include URLs created/faked by the aggregator service. The links required for the backend systems should be persisted somehow by the aggregator, in case the client requests the next page. Especially if the aggregator support _count you'll run into problems, caching results and doing repagination in the aggregator is not always possible, especially not if you also allow for _(rev)includes.
Dheeraj Kumar Pal (Dec 22 2020 at 11:55):
René Spronk said:
In the aggregate bundle, you'll have to include URLs created/faked by the aggregator service. The links required for the backend systems should be persisted somehow by the aggregator, in case the client requests the next page. Especially if the aggregator support _count you'll run into problems, caching results and doing repagination in the aggregator is not always possible, especially not if you also allow for _(rev)includes.
Multiple FHIR servers response will have their link. Should we create/faked all the links and maintain it in aggregated response with FHIR server ID appended to it (Just in case if receiver query back using the one of the link)?
Lloyd McKenzie (Dec 22 2020 at 14:13):
Your aggregator needs to behave as though it's a single endpoint with a single set of ordered pages. You'll have to manage the pages returned from the source system and generate your own links.
Dheeraj Kumar Pal (Dec 22 2020 at 19:07):
Lloyd McKenzie said:
Your aggregator needs to behave as though it's a single endpoint with a single set of ordered pages. You'll have to manage the pages returned from the source system and generate your own links.
We are pretty clear on creating single endpoint and points to our service instead of the actual source.
Is there any example or profile.. we can refer to?
Lloyd McKenzie (Dec 22 2020 at 19:57):
Not really, because any example would be indistinguishable from a regular search interface. A system querying your interface should be completely unaware that there are multiple independently queried systems behind the scenes. The only possible difference you might see is that the 'source' element of the resources might indicate what server they came from. The client won't see the page links from any of the source systems. (And, in some cases, some of the source systems might have different paging sizes or even not support paging at all.) When the client hits your 'generated' link for a new page, you'll have to look up the cached results from the different source queries and possibly just send more rows from the data you already have or, if necessary, request new pages from one or more of the sources, interpolating results as per the requested sort order.
Dheeraj Kumar Pal (Dec 22 2020 at 21:09):
Lloyd McKenzie said:
Not really, because any example would be indistinguishable from a regular search interface. A system querying your interface should be completely unaware that there are multiple independently queried systems behind the scenes. The only possible difference you might see is that the 'source' element of the resources might indicate what server they came from. The client won't see the page links from any of the source systems. (And, in some cases, some of the source systems might have different paging sizes or even not support paging at all.) When the client hits your 'generated' link for a new page, you'll have to look up the cached results from the different source queries and possibly just send more rows from the data you already have or, if necessary, request new pages from one or more of the sources, interpolating results as per the requested sort order.
Thank you @Lloyd McKenzie !! We are pass through central service connecting multiple HIEs and are not authorized to cache the data. that's where the challenge it . We are brainstorming this and will come ack to group with specific question.
@John Moehrke and @David Pyke .. Any insight on this from HIEs perspective?
Lloyd McKenzie (Dec 22 2020 at 21:12):
You have to cache or you can't amalgamate. The best you could do is have someone send you a batch of queries and let you route them to their respective targets and then consolidate the responses into a batch response.
John Moehrke (Dec 22 2020 at 22:23):
I don't know of experience in real-world, yet. As Lloyd points out, this is not going to be easy if paging is needed while caching is forbidden. The only possibility I can think of is an intermediary for 10 backends, and take 10 results from 10 backends, add them together into one 100 entry bundle you give back to the client. You would still need to cache the next links from each of the 10 backends, in order to support a next from your client. the paging mechanism does give the server (you in the eyes of your client) the ability to ignore the client's request for paging... but so too your backends might do that to you. being robust to all of these design challenges is... a design challenge ... good fun if you ask me. I see many other failure-modes related to these design constraints.
Lloyd McKenzie (Dec 22 2020 at 23:55):
Even that would be misleading if you're sorting. For example, if you query for patients and sort by name, the second 10 patients from endpoint 1 could fall alphabetically between some of the patients in the first 10 patients from endpoint 2.
Lloyd McKenzie (Dec 22 2020 at 23:56):
If you only return un-ordered results, you might be able to make it work, but paging with unordered results is pretty useless.
René Spronk (Dec 23 2020 at 08:03):
I'm aware of other implementations of aggregators, but I'm not aware quite how they've solved this. Easiest would be for the aggregator to fetch the full result set from all backend systems, cache it (perhaps only in memory), and respond to paging request by using the cache. Anything else would lead to significant implementation challenges.
John Moehrke (Dec 23 2020 at 13:02):
indeed adding the demand for ANY kind of sort, also highly complicated. I was responding with a presumption that sort was not supported by your system. The thinking is what constraints can you place on your API to the clients that make hitting all of these other constraints possible.
John Moehrke (Dec 23 2020 at 13:04):
as to cache... what is their definition of what kind of caching is forbidden? because I certainly could see a definition of caching that would not even allow you to decompose the backend results so as to reassemble them into your results. That instant in time, could be considered a cache. If that instant in time isn't a cache, then is the few-instances-in-time that it takes for the client to request 'next' short enough to not be cache?
David Pyke (Dec 23 2020 at 14:13):
I think we're dealing with the problem of what caching is. If you're not allowed to store it long term, that's fine. However, if you're not allowed to hold onto it for even 1 minute, then you're going to have serious problems. I would suggest that you take the "caching" to mean "hold until end of transaction", meaning you can sort, or page it as needed.
Dheeraj Kumar Pal (Dec 23 2020 at 15:03):
We are not allowed to store. we can hold it in memory and that's how we are collecting responses from all sources and aggregating the response for requestor. As link is very specific to the sources generating the response and we are already seeing different types of link coming in response from different sources. We were wondering if there is similar initiatives and also wanted to understand if standard defines something. I think in IHEs world, it's could be a key issue. There are couple of options we are exploring to map the different sources link to central service. we will post the options here for feedback and critics. We would request please share any additional information anyone have with related topic.
Lloyd McKenzie (Dec 23 2020 at 17:34):
The options available are:
- receive a Bundle of search requests and return a Bundle of search responses (using 'batch') - i.e. the client is aware of all the systems you're hitting and you just act as a router
- aggregate all of the data from the different sources and return it without any paging (possibly using bulk data)
- persist the page references in memory across calls and resolve your internally generated page link to the cached set of page links and previously retrieved results to retrieve additional data as necessary and figure out what the pages should look like
- create a custom operation that does whatever you like - but that most systems won't support
k connor (Dec 23 2020 at 18:51):
Another vector of challenges is how the aggregator will deal with assigning a Bundle level high water security label based on any labeled resources.
Last updated: Apr 12 2022 at 19:14 UTC