Stream: connectathon mgmt
Topic: Bulk Data
Grahame Grieve (Dec 23 2017 at 21:42):
@Josh Mandel @Dan Gottlieb I think that my server is good to go for this track. Do you know of any issues?
Josh Mandel (Dec 23 2017 at 21:48):
We discussed last week -- we still plan to review and share any issues, but there's no known issues currently. SMART will bring a simple server, too, as well as a CLI and Web UI client.
Josh Mandel (Jan 10 2018 at 22:35):
@Grahame Grieve Getting around to testing again (cc @Dan Gottlieb) and I'm seeing the same (old, surprising) behavior from your server.
Josh Mandel (Jan 10 2018 at 22:36):
curl -vvv -H "Accept: application/fhir+ndjson" -H "Prefer: respond-async" "http://test.fhir.org/r3/Patient/$everything"
still kicks off a job that only produces Patient outputs (not other resource types) and only 50 of them.
Josh Mandel (Jan 10 2018 at 22:37):
Whereas I'm expecting to see resource-type-specific .ndjson output files (e.g. Patient, Observation, Immunization, etc) with many resources in each file.
Grahame Grieve (Jan 11 2018 at 04:44):
@Josh Mandel I don't understand. I get 100s of results of all different types returned by that query
Grahame Grieve (Jan 11 2018 at 04:44):
just like what you're expecting
Dan Gottlieb (Jan 11 2018 at 17:16):
@Josh Mandel - it looks to be working for me when testing from postman (for example, I get 161 Observation resources as one of the files). I haven't tried using authentication yet, since I'm in a meeting and am supposed to be paying attention :)
Dan Gottlieb (Jan 11 2018 at 18:06):
@Grahame Grieve We'll need to define what's returned if there are no resources that match the query. Currently your server is returning a link header with empty bundle and a zip file of some sort (I can't open it). On the SMART bulk data server, we're returning a 204 "No Content" response.
<http://test.fhir.org/r3/task/f69d2261-3d71-4fdb-971e-a847860c2488/Bundle.ndjson>;rel=item,
<http://test.fhir.org/r3/task/f69d2261-3d71-4fdb-971e-a847860c2488.zip>;rel=collection
Grahame Grieve (Jan 11 2018 at 20:21):
we do need to agree about that. technically there's always a bundle, which is empty. I always return the bundle, btw, when there's data, since the bundle threads all the data together - though it's not clear how often that's relevant - it depends on what you're going to do with the data
Grahame Grieve (Jan 11 2018 at 20:22):
I suppose you could argue that if there's no data, there should be no data, though there's one important piece of information in the bundle that .. might not ... be anywhere else - the server data to use in the next query if checking for changes
Dan Gottlieb (Jan 11 2018 at 20:40):
But shouldn't the client that initiated the request already know the start date it asked for (or the request timestamp if no start date is included)?
Grahame Grieve (Jan 11 2018 at 21:32):
well, yes, and no. if you're asking for changes since the last time... you should use the server's nominated transaction time to ask for that
Dan Gottlieb (Jan 11 2018 at 21:45):
Got it - could we return that in a header instead of making the client download and parse a bundle?
Grahame Grieve (Jan 11 2018 at 21:58):
well, the header contains a date (usually) but it wouldn't be the date of the transaction. I think. So it would have to be a custom header
Dan Gottlieb (Jan 11 2018 at 22:01):
Yup - I created an issue to track this question: https://github.com/smart-on-fhir/fhir-bulk-data-docs/issues/4
Grahame Grieve (Jan 11 2018 at 22:52):
so do we want to define a custom header?
Josh Mandel (Jan 12 2018 at 00:58):
I'm curious why I was seeing different behavior. I was using public access, with the same curl queries I showed preciously.
Grahame Grieve (Jan 12 2018 at 01:03):
I think you had a caching issue. is it sorted now?
Dan Gottlieb (Jan 12 2018 at 13:40):
If we end up moving the link list to the body (https://github.com/smart-on-fhir/fhir-bulk-data-docs/issues/1 ) , we could put the timestamp there, but otherwise I think a header on the last status response would make sense. Do you think using last-modified would be confusing? Otherwise, I suppose we could do something custom like x-transaction-date.
Grahame Grieve (Jan 12 2018 at 21:43):
what would last modified refer to in this case?
Dan Gottlieb (Jan 12 2018 at 22:11):
The FHIR data in the ndjson files linked in the response header (which is why it might be confusing)
Grahame Grieve (Jan 12 2018 at 22:15):
wouldn't last-modified have to change as you varied the response during the life-time of the task?
Dan Gottlieb (Jan 12 2018 at 22:22):
I think we'd only want to return it with the link header in the final status response
Grahame Grieve (Jan 12 2018 at 22:29):
then it's not the standard last-modified header. I think
Josh Mandel (Jan 16 2018 at 13:38):
My issue definitely wasn't with caching since I was just running curl. Or if there was caching going on, it could have been on the fhir server side. Is that what you're suggesting Grahame? I will take a look again today :simple_smile:
Josh Mandel (Jan 16 2018 at 19:54):
Here's what I'm seeing -- issuing a "get everything" bulk data request getting just a few resources and only 49 lines per output file. Am I doing something silly @Grahame Grieve ?
jmandel@morel:~$ curl -vvv -H "Accept: application/fhir+ndjson" -H "Prefer: respond-async" "http://test.fhir.org/r3/Patient/$everything" 2>&1 * Trying 104.154.149.246... * TCP_NODELAY set * Connected to test.fhir.org (104.154.149.246) port 80 (#0) > GET /r3/Patient/ HTTP/1.1 > Host: test.fhir.org > User-Agent: curl/7.52.1 > Accept: application/fhir+ndjson > Prefer: respond-async > < HTTP/1.1 202 Accepted < Connection: close < Content-Type: text/plain; charset=ISO-8859-1 < Content-Length: 0 < Cache-control: public, max-age=600 < Date: Tue, 16 Jan 2018 19:51:12 GMT < X-Request-Id: 32-2274 < Access-Control-Allow-Origin: * < Access-Control-Expose-Headers: Content-Location, Location < Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE < Content-Location: http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10 < Server: Health Intersections FHIR Server < * Curl_http_done: called premature == 0 * Closing connection 0 jmandel@morel:~$ curl -H "Accept: application/fhir+ndjson" -v http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10 * Trying 104.154.149.246... * TCP_NODELAY set * Connected to test.fhir.org (104.154.149.246) port 80 (#0) > GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10 HTTP/1.1 > Host: test.fhir.org > User-Agent: curl/7.52.1 > Accept: application/fhir+ndjson > < HTTP/1.1 200 OK < Connection: close < Content-Type: text/plain; charset=ISO-8859-1 < Content-Length: 0 < Cache-control: public, max-age=600 < Date: Tue, 16 Jan 2018 19:51:44 GMT < X-Request-Id: 32-2277 < Access-Control-Allow-Origin: * < Access-Control-Expose-Headers: Content-Location, Location < Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE < Link: <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Bundle.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10.zip>;rel=collection < Server: Health Intersections FHIR Server < * Curl_http_done: called premature == 0 * Closing connection 0 jmandel@morel:~$ curl -H "Accept: application/fhir+ndjson" -v http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson | wc -l % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 104.154.149.246... * TCP_NODELAY set * Connected to test.fhir.org (104.154.149.246) port 80 (#0) > GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson HTTP/1.1 > Host: test.fhir.org > User-Agent: curl/7.52.1 > Accept: application/fhir+ndjson > < HTTP/1.1 200 OK < Connection: close < Content-Type: application/x-ndjson < Content-Length: 31032 < Cache-control: public, max-age=600 < Date: Tue, 16 Jan 2018 19:52:11 GMT < X-Request-Id: 32-2278 < Access-Control-Allow-Origin: * < Access-Control-Expose-Headers: Content-Location, Location < Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE < Server: Health Intersections FHIR Server < { [1374 bytes data] * Curl_http_done: called premature == 0 100 31032 100 31032 0 0 144k 0 --:--:-- --:--:-- --:--:-- 144k * Closing connection 0 49 jmandel@morel:~$
Josh Mandel (Jan 16 2018 at 22:41):
The specific surprises are:
1. Expected many more resource types, and didn't expect a zip file:
Link: <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Bundle.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10.zip>;rel=collection
2. Expected many more rows in GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson
but only found 49.
Grahame Grieve (Jan 17 2018 at 11:36):
I don't understand this at all. I won't use curl because it has a stupid dependency list on windows. but when I do the same request, I get a a redirect, to http://test.fhir.org/r3/task/cf913aff-ad0d-4bde-b0c2-1cd2194316b0. If I wait a couple of minutes, the file http://test.fhir.org/r3/task/cf913aff-ad0d-4bde-b0c2-1cd2194316b0/Patient.ndjson is available and has 341247 bytes. I don't know why you're getting the same response as last time you used curl... that doesn't make any sense, because I fixed this weeks ago
Josh Mandel (Jan 17 2018 at 19:12):
@Dan Gottlieb Are you able to reproduce my findings above?
Dan Gottlieb (Jan 17 2018 at 19:58):
@Josh Mandel I just tested it and get the same reponse when using curl, but get the full set of data when I use postman. I'm not enough of a curl user to debug, but the issue seems to be client specific...
Josh Mandel (Jan 17 2018 at 20:17):
Fascinating. I'll compare headers (e.g. user-agent).
Josh Mandel (Jan 17 2018 at 21:46):
Ok, the issue was shell variable interoplation: $everything
gets resolved by the shell (as an empty string) so the initial request went to /Patient
instead of /Patient/$everything
. Silly me indeed :-)
Dan Gottlieb (Jan 17 2018 at 22:27):
Ha, there's an obvious thing we both missed :)
Grahame Grieve (Jan 18 2018 at 10:16):
woah. that's tricky.
Grahame Grieve (Jan 18 2018 at 10:16):
though you should get 50 lines, not 49 lines
Jenni Syed (Jan 27 2018 at 15:47):
The internet should be an interesting challenge for this track
Jenni Syed (Jan 27 2018 at 15:48):
One of the things raised internally was if we should require chunking to be supportted - on our pop health side, we found that it wasn't uncommon for a client to drop connection or have an issue midway for large file downloads. Chunking would let you start from last chunk rather than all over
Dennis Patterson (Jan 27 2018 at 16:03):
I added a server column to the tracking spreadsheet if any others want to add where they have something clients can start working with
Josh Mandel (Jan 27 2018 at 16:43):
@Jenni Syed there's chunking and also https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests -- I'm not familiar with how a "restart from last chunk" works; would be good if we can catch up so you can educate me on this.
Josh Mandel (Jan 27 2018 at 16:45):
(Also note that we've got the #bulk data channel for spec discussion.)
Dennis Patterson (May 12 2018 at 08:04):
I created the bulk data track on - http://conman.fhir.org/connectathon.html?event=cologne2018 . Check the server tab for Cerner's FHIR server base url
Danielle Friend (May 12 2018 at 08:08):
The Epic FHIR server is also up - let me know if you'd like to test and I'll give you a group ID to test $export with.
Julie Maas (May 12 2018 at 13:18):
@Danielle Friend I would like to try to test thanks!
Danielle Friend (May 12 2018 at 14:10):
@Julie Maas Awesome - you can test with this URL: https://connectathon.epic.com/Interconnect-Fhir-Bulk/api/FHIR/STU3/Group/ebUKRzWjCOkHtNr8R-zHgzc4TdaZx9-TOvXKIeP0bLnM3/$export
You'll also need to specify the following header to your requests:
Epic-Client-ID: 37c67e7b-b694-4fe8-8d46-7c1d27b1c206
Last updated: Apr 12 2022 at 19:14 UTC