FHIR Chat · Shouldn't bulk data API be RESTful? · bulk data

Stream: bulk data

Topic: Shouldn't bulk data API be RESTful?


view this post on Zulip Ed Martin (Jan 29 2018 at 18:15):

For a first time participant at a Connectathon, thanks to all participants, and especially those who provided help to me. It was an eye-opening experience for me, and I look forward to future participation. Please forgive me if this is a newbie question that’s been previously discussed, but after I left the Connectathon, I had this questions..

Why isn’t the proposed Bulk Data API RESTful? If we’re supposed to be following the FHIR API model, couldn’t there be a BulkDataQuery resource that we can POST, GET, PUT and DELETE from the FHIR Server?

Something like this:

BulkDataQuery Resource

  • Id - FHIR resource id
  • QueryParameters - e.g. Patient/$export or whatever gets decided
  • Status - done, not done or perhaps Queued, Completed, InProgress, Cancelled
  • InitialRequestDateTime
  • CompletedDateTime
  • Results - List of URI’s
  • ResultsExpirationDateTime - the date/time when the results will no longer be available.

POST - initiates a bulk data request, creates a new BulkDataQuery resource on the server
GET - gets the BulkDataQuery resource via id (and support some search parameters)
DELETE - cancels the BulkDataQuery request

view this post on Zulip Josh Mandel (Jan 30 2018 at 19:41):

This is an interesting suggestion. Overall it introduces some indirection (i.e., rather than asking for an async response to your query, you're talking to an entirely different endpoint) and some more functionality (e.g., it implies that a client should be able to receive some interim results, or search for previously generated exports, etc -- which could either be an benefit or a needless complexity, depending on perspective).

view this post on Zulip Grahame Grieve (Jan 30 2018 at 19:42):

it overlaps with the Task resource

view this post on Zulip Josh Mandel (Jan 30 2018 at 19:46):

Really? (I mean, on some level: what doesn't — but I never thought of Task as an infrastructure resource.)

view this post on Zulip Jenni Syed (Jan 30 2018 at 19:46):

The asynch response pattern we're doing is a Rest approach (if that's the concern). IE: I could GET all Patients where DOB is after 1970 asynchronously. I wonder if it feels more "funky" becuase of the Operation? But then this would be a concern with operations.

view this post on Zulip Jenni Syed (Jan 30 2018 at 19:49):

If the concern is specifically with the asynch nature, then it wouldn't just be the Bulk Data concern - it would be with all the resources you could select this way.

view this post on Zulip Josh Mandel (Jan 30 2018 at 19:50):

I think the pattern we're describing generalizes to any call (which came up in discussion here), so it'd be sensible to think of it for non-bulk-export use cases like:

1.

GET /Observation?code=123
Prefer: respond-async

(Or whatever).

This feels more natural to me than:

2.

POST /AsyncJob
content-type: application/json

{
  "query": "Observation?code=123",
  "outptFormat": "application/fhir+ndjson"
}

view this post on Zulip Josh Mandel (Jan 30 2018 at 19:51):

Now we could say that requests like #1 tie into #2 by returning results like

Content-location: https://server/fhir/AsyncJob/123

... which acts like a FHIR resource. But I'm not sure what problem this solves.

view this post on Zulip Josh Mandel (Jan 30 2018 at 19:52):

(I wrote AsyncJob here where where @Ed Martin wrote BulkDataQuery -- you should read these as equivalent.)

view this post on Zulip Grahame Grieve (Jan 30 2018 at 20:01):

a resource would potentially mean that you could find each other's requests etc. That's probably not a desired feature. (though that's not something it has to mean)

view this post on Zulip Bas van den Heuvel (Jan 30 2018 at 20:17):

Hiding responses from other users is certainly something that we need to support.

view this post on Zulip Michael Donnelly (Jan 30 2018 at 20:18):

I agree, @Josh Mandel - your GET example is significantly more straightforward.

view this post on Zulip Grahame Grieve (Jan 30 2018 at 20:20):

my server already does GET that way

view this post on Zulip Nagesh Bashyam (Jan 31 2018 at 00:01):

I am thinking out loud ...What Use Case would use the Async capability on regular FHIR APIs ? (Publish - Subscribe type use cases ? ) Doesn't async make the backend more complicated to spin off requests , keep track of them and create some kind of a bundle that has to persist for some time and then clean up ...is that really required for Base FHIR use cases ? I can see that complexity being necessary for bulk apis or even for jobs (tasks that are workflow dependent (i.e human intervention) and/or complex, time consuming in nature. (i.e complex queries using some query language or execution of jobs)

view this post on Zulip Josh Mandel (Jan 31 2018 at 02:15):

I think there's value in saying "here's a consistent way to request data, whether it's in a giant bundle or in an async pile of ndjson files". That said: there's no expectation that most servers would support this async model on "normal" requests; it's only important on requests that require large responses.

view this post on Zulip Grahame Grieve (Jan 31 2018 at 02:54):

it is more complicated for server and client but that doesn't mean it isn't functional

view this post on Zulip Vladimir Ignatov (Jan 31 2018 at 18:34):

How about a note in the spec saying that the servers may send a Retry-After header https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After? I think in some cases this can lead to better pooling then using the back-off algorithm.

view this post on Zulip Josh Mandel (Jan 31 2018 at 19:11):

I think that's a super helpful suggestion for hinting clients when to poll.

view this post on Zulip Michael Donnelly (Jan 31 2018 at 19:49):

This is a solid strategy.

view this post on Zulip Nagesh Bashyam (Jan 31 2018 at 20:00):

Agreed

view this post on Zulip Bas van den Heuvel (Feb 01 2018 at 04:51):

besides large requests it is also usefull for requests that potentially take a lot of time, e.g. evaluation of a population measure.

view this post on Zulip Dan Gottlieb (Feb 01 2018 at 17:30):

I added a note about the retry-after header to the write up: https://github.com/smart-on-fhir/fhir-bulk-data-docs/blob/master/README.md


Last updated: Apr 12 2022 at 19:14 UTC