FHIR Chat · Size limit · bulk data

Stream: bulk data

Topic: Size limit


view this post on Zulip dsh (Jul 26 2021 at 02:24):

Hi am new to bulk data export so this may be a childish question. Our server has close 500K Condition resources and I want to fetch all of them to figure out the population distribution across Condition.code so when I issue this bulk data request

/$export?_outputFormat=ndjson&_type=Condition&_since=2020-01-01T00:00:00.000Z

I get only 200 Condition resources with no ability to paginate to next 200, so is there a way to get all Condition resources in one bulk data export ?

view this post on Zulip dsh (Jul 26 2021 at 02:27):

I don't think it makes sense to change the max_page_size: 200 in application.yml of the server to get bulk data export exporting everything :anguished:

view this post on Zulip dsh (Jul 26 2021 at 08:47):

any ideas?

view this post on Zulip Vassil Peytchev (Jul 26 2021 at 12:09):

Which server for implementation are you using?

view this post on Zulip dsh (Jul 26 2021 at 16:02):

Vassil Peytchev said:

Which server for implementation are you using?

JPA server 5.3.0

view this post on Zulip Vladimir Ignatov (Jul 26 2021 at 16:16):

That is probably a HAPI server then? Assuming your page size is 200, you should get multiple files with 200 conditions in each of them.

Or perhaps only 200 have been modified since 2020-01-01? Anyhow, if you want to get all the conditions, then try removing the _since parameter.

view this post on Zulip dsh (Jul 26 2021 at 16:19):

First I tried without _since but it didn't work and there are about 500K Conditions resources modified since 2020/01/01 ... then frustratingly I increased the max_page_size parameter to 1 million to fetch all Conditions

view this post on Zulip dsh (Jul 26 2021 at 16:21):

I am not sure if this is a bug but if others can confirm and or tell me what I might be doing wrong that will help

view this post on Zulip Vladimir Ignatov (Jul 26 2021 at 16:22):

Were you only getting a single file link in your export manifest (before increasing the limit)?

view this post on Zulip dsh (Jul 26 2021 at 16:25):

Vladimir Ignatov said:

Were you only getting a single file link in your export manifest (before increasing the limit)?

Yes and that was the weird part

view this post on Zulip dsh (Jul 26 2021 at 16:27):

after I increased the limit I got about 418 links

view this post on Zulip Vladimir Ignatov (Jul 26 2021 at 16:34):

  1. With 500K records and limit of 200 you should have gotten 2.5K links (if they fit into the manifest response size limit)
  2. With 500K records and limit of 1M you should have gotten 1 link to file with 500K rows

That is just the simple math without the internal implementation details that could affect how pagination really works. I would suggest going to the HAPI GitHub and searching for this issue (and maybe posting new one if you don't find anything)

view this post on Zulip dsh (Jul 26 2021 at 16:36):

Vladimir Ignatov said:

  1. With 500K records and limit of 200 you should have gotten 2.5K links (if they fit into the manifest response size limit)
  2. With 500K records and limit of 1M you should have gotten 1 link to file with 500K rows

That is just the simple math without the internal implementation details that could affect how pagination really works. I would suggest going to the HAPI GitHub and searching for this issue (and maybe posting new one if you don't find anything)

your logic makes sense ... but that's not how the server behaved ... this may be a bug ... I will search in GitHub

view this post on Zulip Robert Scanlon (Jul 26 2021 at 16:56):

You could also try asking this question over in the #hapi stream (if that is what you are using), which may catch the attention of someone that knows the specifics of its bulk data implementation.


Last updated: Apr 12 2022 at 19:14 UTC