FHIR Chat · Bulk Data · connectathon mgmt

Stream: connectathon mgmt

Topic: Bulk Data


view this post on Zulip Grahame Grieve (Dec 23 2017 at 21:42):

@Josh Mandel @Dan Gottlieb I think that my server is good to go for this track. Do you know of any issues?

view this post on Zulip Josh Mandel (Dec 23 2017 at 21:48):

We discussed last week -- we still plan to review and share any issues, but there's no known issues currently. SMART will bring a simple server, too, as well as a CLI and Web UI client.

view this post on Zulip Josh Mandel (Jan 10 2018 at 22:35):

@Grahame Grieve Getting around to testing again (cc @Dan Gottlieb) and I'm seeing the same (old, surprising) behavior from your server.

view this post on Zulip Josh Mandel (Jan 10 2018 at 22:36):

curl -vvv -H "Accept: application/fhir+ndjson" -H "Prefer: respond-async" "http://test.fhir.org/r3/Patient/$everything"

still kicks off a job that only produces Patient outputs (not other resource types) and only 50 of them.

view this post on Zulip Josh Mandel (Jan 10 2018 at 22:37):

Whereas I'm expecting to see resource-type-specific .ndjson output files (e.g. Patient, Observation, Immunization, etc) with many resources in each file.

view this post on Zulip Grahame Grieve (Jan 11 2018 at 04:44):

@Josh Mandel I don't understand. I get 100s of results of all different types returned by that query

view this post on Zulip Grahame Grieve (Jan 11 2018 at 04:44):

just like what you're expecting

view this post on Zulip Dan Gottlieb (Jan 11 2018 at 17:16):

@Josh Mandel - it looks to be working for me when testing from postman (for example, I get 161 Observation resources as one of the files). I haven't tried using authentication yet, since I'm in a meeting and am supposed to be paying attention :)

view this post on Zulip Dan Gottlieb (Jan 11 2018 at 18:06):

@Grahame Grieve We'll need to define what's returned if there are no resources that match the query. Currently your server is returning a link header with empty bundle and a zip file of some sort (I can't open it). On the SMART bulk data server, we're returning a 204 "No Content" response.

<http://test.fhir.org/r3/task/f69d2261-3d71-4fdb-971e-a847860c2488/Bundle.ndjson>;rel=item, <http://test.fhir.org/r3/task/f69d2261-3d71-4fdb-971e-a847860c2488.zip>;rel=collection

view this post on Zulip Grahame Grieve (Jan 11 2018 at 20:21):

we do need to agree about that. technically there's always a bundle, which is empty. I always return the bundle, btw, when there's data, since the bundle threads all the data together - though it's not clear how often that's relevant - it depends on what you're going to do with the data

view this post on Zulip Grahame Grieve (Jan 11 2018 at 20:22):

I suppose you could argue that if there's no data, there should be no data, though there's one important piece of information in the bundle that .. might not ... be anywhere else - the server data to use in the next query if checking for changes

view this post on Zulip Dan Gottlieb (Jan 11 2018 at 20:40):

But shouldn't the client that initiated the request already know the start date it asked for (or the request timestamp if no start date is included)?

view this post on Zulip Grahame Grieve (Jan 11 2018 at 21:32):

well, yes, and no. if you're asking for changes since the last time... you should use the server's nominated transaction time to ask for that

view this post on Zulip Dan Gottlieb (Jan 11 2018 at 21:45):

Got it - could we return that in a header instead of making the client download and parse a bundle?

view this post on Zulip Grahame Grieve (Jan 11 2018 at 21:58):

well, the header contains a date (usually) but it wouldn't be the date of the transaction. I think. So it would have to be a custom header

view this post on Zulip Dan Gottlieb (Jan 11 2018 at 22:01):

Yup - I created an issue to track this question: https://github.com/smart-on-fhir/fhir-bulk-data-docs/issues/4

view this post on Zulip Grahame Grieve (Jan 11 2018 at 22:52):

so do we want to define a custom header?

view this post on Zulip Josh Mandel (Jan 12 2018 at 00:58):

I'm curious why I was seeing different behavior. I was using public access, with the same curl queries I showed preciously.

view this post on Zulip Grahame Grieve (Jan 12 2018 at 01:03):

I think you had a caching issue. is it sorted now?

view this post on Zulip Dan Gottlieb (Jan 12 2018 at 13:40):

If we end up moving the link list to the body (https://github.com/smart-on-fhir/fhir-bulk-data-docs/issues/1 ) , we could put the timestamp there, but otherwise I think a header on the last status response would make sense. Do you think using last-modified would be confusing? Otherwise, I suppose we could do something custom like x-transaction-date.

view this post on Zulip Grahame Grieve (Jan 12 2018 at 21:43):

what would last modified refer to in this case?

view this post on Zulip Dan Gottlieb (Jan 12 2018 at 22:11):

The FHIR data in the ndjson files linked in the response header (which is why it might be confusing)

view this post on Zulip Grahame Grieve (Jan 12 2018 at 22:15):

wouldn't last-modified have to change as you varied the response during the life-time of the task?

view this post on Zulip Dan Gottlieb (Jan 12 2018 at 22:22):

I think we'd only want to return it with the link header in the final status response

view this post on Zulip Grahame Grieve (Jan 12 2018 at 22:29):

then it's not the standard last-modified header. I think

view this post on Zulip Josh Mandel (Jan 16 2018 at 13:38):

My issue definitely wasn't with caching since I was just running curl. Or if there was caching going on, it could have been on the fhir server side. Is that what you're suggesting Grahame? I will take a look again today :simple_smile:

view this post on Zulip Josh Mandel (Jan 16 2018 at 19:54):

Here's what I'm seeing -- issuing a "get everything" bulk data request getting just a few resources and only 49 lines per output file. Am I doing something silly @Grahame Grieve ?

jmandel@morel:~$ curl -vvv -H "Accept: application/fhir+ndjson" -H "Prefer: respond-async" "http://test.fhir.org/r3/Patient/$everything" 2>&1
*   Trying 104.154.149.246...
* TCP_NODELAY set
* Connected to test.fhir.org (104.154.149.246) port 80 (#0)
> GET /r3/Patient/ HTTP/1.1
> Host: test.fhir.org
> User-Agent: curl/7.52.1
> Accept: application/fhir+ndjson
> Prefer: respond-async
>
< HTTP/1.1 202 Accepted
< Connection: close
< Content-Type: text/plain; charset=ISO-8859-1
< Content-Length: 0
< Cache-control: public, max-age=600
< Date: Tue, 16 Jan 2018 19:51:12 GMT
< X-Request-Id: 32-2274
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Content-Location, Location
< Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE
< Content-Location: http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10
< Server: Health Intersections FHIR Server
<
* Curl_http_done: called premature == 0
* Closing connection 0
jmandel@morel:~$ curl -H "Accept: application/fhir+ndjson" -v  http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10
*   Trying 104.154.149.246...
* TCP_NODELAY set
* Connected to test.fhir.org (104.154.149.246) port 80 (#0)
> GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10 HTTP/1.1
> Host: test.fhir.org
> User-Agent: curl/7.52.1
> Accept: application/fhir+ndjson
>
< HTTP/1.1 200 OK
< Connection: close
< Content-Type: text/plain; charset=ISO-8859-1
< Content-Length: 0
< Cache-control: public, max-age=600
< Date: Tue, 16 Jan 2018 19:51:44 GMT
< X-Request-Id: 32-2277
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Content-Location, Location
< Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE
< Link: <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Bundle.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson>;rel=item, <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10.zip>;rel=collection
< Server: Health Intersections FHIR Server
<
* Curl_http_done: called premature == 0
* Closing connection 0
jmandel@morel:~$ curl -H "Accept: application/fhir+ndjson" -v  http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 104.154.149.246...
* TCP_NODELAY set
* Connected to test.fhir.org (104.154.149.246) port 80 (#0)
> GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson HTTP/1.1
> Host: test.fhir.org
> User-Agent: curl/7.52.1
> Accept: application/fhir+ndjson
>
< HTTP/1.1 200 OK
< Connection: close
< Content-Type: application/x-ndjson
< Content-Length: 31032
< Cache-control: public, max-age=600
< Date: Tue, 16 Jan 2018 19:52:11 GMT
< X-Request-Id: 32-2278
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Content-Location, Location
< Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE
< Server: Health Intersections FHIR Server
<
{ [1374 bytes data]
* Curl_http_done: called premature == 0
100 31032  100 31032    0     0   144k      0 --:--:-- --:--:-- --:--:--  144k
* Closing connection 0
49
jmandel@morel:~$

view this post on Zulip Josh Mandel (Jan 16 2018 at 22:41):

The specific surprises are:

1. Expected many more resource types, and didn't expect a zip file:

Link: <http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Bundle.ndjson>;rel=item,
<http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson>;rel=item,
<http://test.fhir.org/r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10.zip>;rel=collection

2. Expected many more rows in GET /r3/task/3484a0d1-b268-4909-b241-1b2cb8cefe10/Patient.ndjson but only found 49.

view this post on Zulip Grahame Grieve (Jan 17 2018 at 11:36):

I don't understand this at all. I won't use curl because it has a stupid dependency list on windows. but when I do the same request, I get a a redirect, to http://test.fhir.org/r3/task/cf913aff-ad0d-4bde-b0c2-1cd2194316b0. If I wait a couple of minutes, the file http://test.fhir.org/r3/task/cf913aff-ad0d-4bde-b0c2-1cd2194316b0/Patient.ndjson is available and has 341247 bytes. I don't know why you're getting the same response as last time you used curl... that doesn't make any sense, because I fixed this weeks ago

view this post on Zulip Josh Mandel (Jan 17 2018 at 19:12):

@Dan Gottlieb Are you able to reproduce my findings above?

view this post on Zulip Dan Gottlieb (Jan 17 2018 at 19:58):

@Josh Mandel I just tested it and get the same reponse when using curl, but get the full set of data when I use postman. I'm not enough of a curl user to debug, but the issue seems to be client specific...

view this post on Zulip Josh Mandel (Jan 17 2018 at 20:17):

Fascinating. I'll compare headers (e.g. user-agent).

view this post on Zulip Josh Mandel (Jan 17 2018 at 21:46):

Ok, the issue was shell variable interoplation: $everything gets resolved by the shell (as an empty string) so the initial request went to /Patient instead of /Patient/$everything. Silly me indeed :-)

view this post on Zulip Dan Gottlieb (Jan 17 2018 at 22:27):

Ha, there's an obvious thing we both missed :)

view this post on Zulip Grahame Grieve (Jan 18 2018 at 10:16):

woah. that's tricky.

view this post on Zulip Grahame Grieve (Jan 18 2018 at 10:16):

though you should get 50 lines, not 49 lines

view this post on Zulip Jenni Syed (Jan 27 2018 at 15:47):

The internet should be an interesting challenge for this track

view this post on Zulip Jenni Syed (Jan 27 2018 at 15:48):

One of the things raised internally was if we should require chunking to be supportted - on our pop health side, we found that it wasn't uncommon for a client to drop connection or have an issue midway for large file downloads. Chunking would let you start from last chunk rather than all over

view this post on Zulip Dennis Patterson (Jan 27 2018 at 16:03):

I added a server column to the tracking spreadsheet if any others want to add where they have something clients can start working with

view this post on Zulip Josh Mandel (Jan 27 2018 at 16:43):

@Jenni Syed there's chunking and also https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests -- I'm not familiar with how a "restart from last chunk" works; would be good if we can catch up so you can educate me on this.

view this post on Zulip Josh Mandel (Jan 27 2018 at 16:45):

(Also note that we've got the #bulk data channel for spec discussion.)

view this post on Zulip Dennis Patterson (May 12 2018 at 08:04):

I created the bulk data track on - http://conman.fhir.org/connectathon.html?event=cologne2018 . Check the server tab for Cerner's FHIR server base url

view this post on Zulip Danielle Friend (May 12 2018 at 08:08):

The Epic FHIR server is also up - let me know if you'd like to test and I'll give you a group ID to test $export with.

view this post on Zulip Julie Maas (May 12 2018 at 13:18):

@Danielle Friend I would like to try to test thanks!

view this post on Zulip Danielle Friend (May 12 2018 at 14:10):

@Julie Maas Awesome - you can test with this URL: https://connectathon.epic.com/Interconnect-Fhir-Bulk/api/FHIR/STU3/Group/ebUKRzWjCOkHtNr8R-zHgzc4TdaZx9-TOvXKIeP0bLnM3/$export

You'll also need to specify the following header to your requests:
Epic-Client-ID: 37c67e7b-b694-4fe8-8d46-7c1d27b1c206


Last updated: Apr 12 2022 at 19:14 UTC