Stream: bulk data
Topic: Changes to Bulk Data Proposal
Dan Gottlieb (Jan 27 2018 at 20:56):
Two changes to bulk data proposal based on our breakout session:
Dan Gottlieb (Jan 27 2018 at 20:57):
1. On the initial kick-off request, server should accept a "output-format" parameter indicating the format for the bulk data files. Currently, this must be "application/fhir+ndjson". The Accept header will indicate the format for an OperationOutcome response to the kick-off request itself (eg. in the case of a missing parameter).
Dan Gottlieb (Jan 27 2018 at 20:57):
2. On the final status request (response type of 200), return a body with the following json structure:
{ "transactionTime": "[instant]", //the server's time when the query is run (no resources that have a modified data after this instant should be in the response) "request" : "Patient/$everything?_type=Patient,Observation", //GET request that kicked-off the bulk data response "secure" : true, //authentication is required to retrieve the files "output" : [{ "type" : "Patient", //resource type contained in the file "url" : "http://serverpath2/patient_file_1.ndjson" },{ "type" : "Patient", "url" : "http://serverpath2/patient_file_2.ndjson" },{ "type" : "Observation", "url" : "http://serverpath2/observation_file_1.ndjson" }] }
Nagesh Bashyam (Jan 27 2018 at 21:37):
Dan - I am assuming the output-format is a parameter to the operation and is not part of the header ?
Nagesh Bashyam (Jan 27 2018 at 21:39):
Also - for the next breakout, it might be worth talking about a polling interval for the status , have heard a few people mention last week at ONC that since it is an async request, it might take more than a few minutes to get the data ready, so it might be good to not continuously poll but poll at regular intervals that is specified by a server in its initial response with the content location
Grahame Grieve (Jan 27 2018 at 21:39):
thanks @Dan Gottlieb - when did we decide to make those changes?
Dan Gottlieb (Jan 27 2018 at 21:41):
@Nagesh Bashyam Yup, -on the querystring. Also, it would be good to discuss the polling. My initial thought is that we probably want to recommend https://en.wikipedia.org/wiki/Exponential_backoff
Dan Gottlieb (Jan 27 2018 at 21:43):
@Grahame Grieve asap, but while leaving the link header in place and continuing to accept ndjson in the Accept header so they're non-breaking for now
Grahame Grieve (Jan 27 2018 at 21:43):
I'm not sure the change is non-breaking for me
Dan Gottlieb (Jan 27 2018 at 21:44):
In what way?
Grahame Grieve (Jan 27 2018 at 21:44):
the accept header change
Grahame Grieve (Jan 27 2018 at 21:44):
I'm going to investigate
Grahame Grieve (Jan 27 2018 at 21:47):
(deleted)
Vladimir Ignatov (Jan 27 2018 at 22:34):
The server at https://bulk-data.smarthealthit.org was updated to implement these changes. Feedback is appreciated
Dan Gottlieb (Jan 27 2018 at 22:35):
:thumbs_up:
Josh Mandel (Jan 27 2018 at 22:54):
Having a response body with clear type info for each resulting file is working really nicely here.
Jason Walonoski (Jan 27 2018 at 23:28):
Exponential backoff is fine as a recommendation, but I don't think this should be part of the spec. Different clients in different environments will poll their own way.
Danielle Friend (Jan 27 2018 at 23:29):
The server at https://bulk-data.smarthealthit.org was updated to implement these changes. Feedback is appreciated
@Vladimir Ignatov how do you handle multiple requests for $everything? When I make the $everything request multiple times (minutes apart or in quick succession), the response for each returns the same content-location. Are these generated per request/on the fly?
Vladimir Ignatov (Jan 27 2018 at 23:42):
@Danielle Friend Yes. This is a demo server. It does not really generate any files on it's file system. It just makes you wait a while and then gives you a list of files to download... In other words, multiple calls to $everything with the same parameters will result in the same content-location
Dan Gottlieb (Jan 27 2018 at 23:45):
@Danielle Friend @Vladimir Ignatov - it seems like this may be the correct approach since on subsequent requests you're asking for files that have already been generated...
Nagesh Bashyam (Jan 27 2018 at 23:47):
I dont think that would work, because each request in the case of Targeted extract for a different set of patients. So some kind of request / response tracking is necessary on the server side which is what we implemented. Even in the case of Patient/$everything, new patients or their observations may change in the system and the previously generated data is not appropriate, unless i am not understanding.
Nagesh Bashyam (Jan 27 2018 at 23:48):
Quick comment on the Root URL recommended by Jason : This is how we had defined it in DAF-Research, before bulk-api came into being. Also defined on the Root URL.
http://hl7.org/fhir/us/daf-research/STU2/OperationDefinition-daf-extract.html
Nagesh Bashyam (Jan 28 2018 at 15:19):
I updated my server with the changes recommended...Feel free to give it a shot.
http://52.70.192.201/open-fhir/fhir/Patient/$everything
Toby Hu (Jan 28 2018 at 19:33):
@Nagesh Bashyam for the initial query, i think the expected response status code should be 202, your server returns 200.
Nagesh Bashyam (Jan 28 2018 at 19:39):
Toby, I looked into it and I will have to get a fix from James (for HAPI libraries) to override that status. I will take care of it on the next version.
Toby Hu (Jan 28 2018 at 19:40):
Sounds good. Thanks.
Toby Hu (Jan 28 2018 at 19:45):
I registered my client at http://snapp.clinfhir.com:4000/ and code can be found at https://github.com/toby-hu/test/tree/master/client .
Any server would like to try a connect?
Last updated: Apr 12 2022 at 19:14 UTC