FHIR Chat · bulk data api client · bulk data

Stream: bulk data

Topic: bulk data api client


view this post on Zulip Vasyl Herman (Jan 20 2021 at 16:16):

Hello,
I am looking for a JS or Python implementation of bulk client to pull the data from a FHIR bulk API and save it localy. I would appritiate any help on discovering such implementation. Could anybody help?

view this post on Zulip Vladimir Ignatov (Jan 20 2021 at 16:18):

Here is a JS client you can try: https://github.com/smart-on-fhir/sample-apps-stu3/tree/master/fhir-downloader

view this post on Zulip Vasyl Herman (Jan 20 2021 at 20:12):

Thanks! Could you please help me figure out why it gives me an error:
413 Payload Too Large
Too many files

PS: I've found this link
https://bulk-data.smarthealthit.org/?m=10&stu=4
it works if I set Database Size to 1.000 Patients. however if I set more it throws the same error.
Should I increace some system limits? Could you help please? or it is the server side limit?

view this post on Zulip Vladimir Ignatov (Jan 20 2021 at 21:01):

That is specific to that server only. To avoid it, restrict the export by passing a _since and/or _type parameter. To do so with this client you can pass the -T and -s options like so:

node . -T Patient,Observation -s 2020-01-01T00:00:00
This basically means "Give me only patients and observations modified since 2020".

There are too many variables involved in this. It might be a good idea to play a little with https://bulk-data.smarthealthit.org/sample-app/index.html?server=https%3A%2F%2Fbulk-data.smarthealthit.org%2FeyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWwiOjB9%2Ffhir. Once you are satisfied with what gets exported, you can "translate" the used parameters to the CLI command.

view this post on Zulip Vasyl Herman (Jan 20 2021 at 21:07):

@Vladimir Ignatov Thank you!!!

view this post on Zulip Vasyl Herman (Jan 21 2021 at 09:32):

I am wondering if there is a way of convenrting ndjson files to FHIR Bundles. I am going to use cql-exec-fhir library to perform healthcare quality measures. The FHIR Data Source (for cql-exec-fhir library) expects each patient to be represented as a single FHIR Bundle containing all of the patient's relevant data but we have .ndjson as output of fhir-downloader.

view this post on Zulip Josh Mandel (Jan 21 2021 at 14:04):

The main challenge here isn't ndjson->bundle; it's bringing together all data about a patient (spread across multiple ndjson files) into a single file. It's easy enough at small scale... you can pretty much do this in a single pass over the ndjson files, scanning for resources linked to your "target patient ID".

view this post on Zulip Richard Stanley (Jan 21 2021 at 14:27):

Hi @Josh Mandel I had a look at the Bulk Data API early on and I had thought I saw that each record for a resource was an individual file. Now I notice that in the Synthea generator output that all of one type of resource are in one file. This means that, for example, the Patient.ndjson has many patients in it. I'm curious as to this structure. I use Go a bit, and the 'everything in one file' per resource type creates additional compute steps to unmarshall into structs, vs each individual resource on its own.

view this post on Zulip Josh Mandel (Jan 21 2021 at 14:56):

The motivation for "each file has resources of a specific type" was designed to optimize loading into database / analysis tools. We don't optimize for "ease of querying data about a single patient" because bulk data is explicitly focused on larger populations.

view this post on Zulip Richard Stanley (Jan 21 2021 at 15:07):

Ok, great. Thanks. I'm trying to push into a FHIR server efficiently. So, the current practice to load bulk data is to POST the entire file, rather than PUT (which requires a resource ID in path)?

view this post on Zulip Richard Stanley (Jan 21 2021 at 15:19):

Sorry, I'm trying to understand and will try to be more clear. With everything in one file the assumption seems to be that it's for POSTing new resources to the target server. Otherwise you have to parse the bulk data file and get IDs for a PUT (requires service_url/resource/id in the path). THis means that updating resources with PUT is not an anticipated use case.

view this post on Zulip Josh Mandel (Jan 21 2021 at 15:26):

This is out of scope for what we've tried to specify, so I don't have precise guidance for you. Specific servers often have "fast path" capabilities for bringing in large data sets all at once (e.g. this capability in GCP) but this is "below the abstraction barrier" of the FHIR API.

view this post on Zulip Josh Mandel (Jan 21 2021 at 15:27):

More broadly: you can decide whether you want to preserve IDs when you import if your server supports this (in which case, you can use PUTs, or batch bundles of PUTs, or transaction bundles of PUTs) or not (in which case, you might need to remap IDs to match the constraints of your server).

view this post on Zulip Vasyl Herman (Jan 24 2021 at 17:31):

I am trying to export bulk data from HAPI FHIR Server built from master using the fhir-downloader. Seems like it works fine, but it gives me back ndjson files named in a weird way. How this is controlled? I expected to see [Resource-name].ndjson files.
bulk-data.jpg

view this post on Zulip Michele Mottini (Jan 24 2021 at 17:35):

Server can name the files whatever they like

view this post on Zulip Vasyl Herman (Jan 24 2021 at 17:39):

@Michele Mottini Thanks! I just thought there is a way to setup naming thru application.yaml or something


Last updated: Apr 12 2022 at 19:14 UTC