FHIR Chat · how to handle very large result sets · implementers

Stream: implementers

Topic: how to handle very large result sets


view this post on Zulip Peter Scholz (Apr 14 2016 at 11:25):

It's a question which poped up during the current IHE Connectathon:

When performing a search request which results in a very large result set (this time it was AuditEvent with more than 40k results) without a _count option, we ran out of resources or the request did take too long so that a timeout occured on client side.

view this post on Zulip Peter Scholz (Apr 14 2016 at 11:25):

....
is it possible to limit the size of the result set without paging mode ?

view this post on Zulip Grahame Grieve (Apr 14 2016 at 12:04):

why wouldn't paging mode work?

view this post on Zulip Peter Scholz (Apr 14 2016 at 12:30):

3 reasons:
1st - client did not request paging
2nd - I do paging by producing snaphots, which means before producing a single result page I have to do the full search to populate all search pages
3rd - to support correct links and bundle.total I would have to produce the complete result set (at least if I want to provide the "last" link and the result.total property)

view this post on Zulip Grahame Grieve (Apr 14 2016 at 12:34):

have you seen the error 'too costly'? that's the first line of defense here

view this post on Zulip Peter Scholz (Apr 14 2016 at 12:40):

thanks for the hint, that looks good.

the other solution might be the error "incomplete", if it is allowed add the partial results in the bundle in addition to an entry of operation outcome

view this post on Zulip James Agnew (Apr 14 2016 at 14:41):

The server is also allowed to provide its own default page size even if the client doesn't request one. I think no matter what this is a sensible thing to do.

view this post on Zulip James Agnew (Apr 14 2016 at 14:42):

IMO you should also enforce an upper limit on _count in the server and ignore (or reduce) client _count requests bigger than that in order to prevent DOS attacks

view this post on Zulip Peter Scholz (Apr 14 2016 at 15:06):

the default pagesize is quite ok, but that does only limit the amount of results on one page,
As figured out earlier, the problem arises from long lasting queries, to populate all pages as I'm doing snapshots and provide links with relation="last" as well as a total property of the Bundle resource to represent the total number of results

So Grahame's info about an OperationOutcome resource and either the "too costly" or the "incomplete" error will be perfect for my.

Either I will return an error with an OperationOutcome of "too costly" or what I prefer return a Bundle containing a limited resultset containing an entry of type OperartionOutcome with the "incomplete" error as the first entry, followed by the partial result (possibly paged too)

view this post on Zulip James Agnew (Apr 14 2016 at 15:07):

Ahhhh I get it. Yeah, that's a separate issue.

view this post on Zulip Brian Postlethwaite (Apr 15 2016 at 04:49):

+1 for servers doing a default _count, and capping the size for the reasons already provided.

view this post on Zulip Brian Postlethwaite (Apr 15 2016 at 05:06):

Our implementation doesn't need to save snapshots as we just page through the index from a defined point (when the search started)
This works as no data is changed going back in time, as all changes are new versions (including deletes)

view this post on Zulip Brian Postlethwaite (Apr 15 2016 at 05:08):

The back next links encode this info


Last updated: Apr 12 2022 at 19:14 UTC