Stream: bulk data
Topic: bulk data api client
Vasyl Herman (Jan 20 2021 at 16:16):
Hello,
I am looking for a JS or Python implementation of bulk client to pull the data from a FHIR bulk API and save it localy. I would appritiate any help on discovering such implementation. Could anybody help?
Vladimir Ignatov (Jan 20 2021 at 16:18):
Here is a JS client you can try: https://github.com/smart-on-fhir/sample-apps-stu3/tree/master/fhir-downloader
Vasyl Herman (Jan 20 2021 at 20:12):
Thanks! Could you please help me figure out why it gives me an error:
413 Payload Too Large
Too many files
PS: I've found this link
https://bulk-data.smarthealthit.org/?m=10&stu=4
it works if I set Database Size to 1.000 Patients. however if I set more it throws the same error.
Should I increace some system limits? Could you help please? or it is the server side limit?
Vladimir Ignatov (Jan 20 2021 at 21:01):
That is specific to that server only. To avoid it, restrict the export by passing a _since
and/or _type
parameter. To do so with this client you can pass the -T
and -s
options like so:
node . -T Patient,Observation -s 2020-01-01T00:00:00
This basically means "Give me only patients and observations modified since 2020".
There are too many variables involved in this. It might be a good idea to play a little with https://bulk-data.smarthealthit.org/sample-app/index.html?server=https%3A%2F%2Fbulk-data.smarthealthit.org%2FeyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWwiOjB9%2Ffhir. Once you are satisfied with what gets exported, you can "translate" the used parameters to the CLI command.
Vasyl Herman (Jan 20 2021 at 21:07):
@Vladimir Ignatov Thank you!!!
Vasyl Herman (Jan 21 2021 at 09:32):
I am wondering if there is a way of convenrting ndjson files to FHIR Bundles. I am going to use cql-exec-fhir library to perform healthcare quality measures. The FHIR Data Source (for cql-exec-fhir library) expects each patient to be represented as a single FHIR Bundle containing all of the patient's relevant data but we have .ndjson as output of fhir-downloader.
Josh Mandel (Jan 21 2021 at 14:04):
The main challenge here isn't ndjson->bundle; it's bringing together all data about a patient (spread across multiple ndjson files) into a single file. It's easy enough at small scale... you can pretty much do this in a single pass over the ndjson files, scanning for resources linked to your "target patient ID".
Richard Stanley (Jan 21 2021 at 14:27):
Hi @Josh Mandel I had a look at the Bulk Data API early on and I had thought I saw that each record for a resource was an individual file. Now I notice that in the Synthea generator output that all of one type of resource are in one file. This means that, for example, the Patient.ndjson has many patients in it. I'm curious as to this structure. I use Go a bit, and the 'everything in one file' per resource type creates additional compute steps to unmarshall into structs, vs each individual resource on its own.
Josh Mandel (Jan 21 2021 at 14:56):
The motivation for "each file has resources of a specific type" was designed to optimize loading into database / analysis tools. We don't optimize for "ease of querying data about a single patient" because bulk data is explicitly focused on larger populations.
Richard Stanley (Jan 21 2021 at 15:07):
Ok, great. Thanks. I'm trying to push into a FHIR server efficiently. So, the current practice to load bulk data is to POST the entire file, rather than PUT (which requires a resource ID in path)?
Richard Stanley (Jan 21 2021 at 15:19):
Sorry, I'm trying to understand and will try to be more clear. With everything in one file the assumption seems to be that it's for POSTing new resources to the target server. Otherwise you have to parse the bulk data file and get IDs for a PUT (requires service_url/resource/id in the path). THis means that updating resources with PUT is not an anticipated use case.
Josh Mandel (Jan 21 2021 at 15:26):
This is out of scope for what we've tried to specify, so I don't have precise guidance for you. Specific servers often have "fast path" capabilities for bringing in large data sets all at once (e.g. this capability in GCP) but this is "below the abstraction barrier" of the FHIR API.
Josh Mandel (Jan 21 2021 at 15:27):
More broadly: you can decide whether you want to preserve IDs when you import if your server supports this (in which case, you can use PUTs, or batch bundles of PUTs, or transaction bundles of PUTs) or not (in which case, you might need to remap IDs to match the constraints of your server).
Vasyl Herman (Jan 24 2021 at 17:31):
I am trying to export bulk data from HAPI FHIR Server built from master using the fhir-downloader. Seems like it works fine, but it gives me back ndjson files named in a weird way. How this is controlled? I expected to see [Resource-name].ndjson files.
bulk-data.jpg
Michele Mottini (Jan 24 2021 at 17:35):
Server can name the files whatever they like
Vasyl Herman (Jan 24 2021 at 17:39):
@Michele Mottini Thanks! I just thought there is a way to setup naming thru application.yaml
or something
Last updated: Apr 12 2022 at 19:14 UTC