FHIR Chat · Shouldn't bulk data API be RESTful?

Stream: bulk data

Topic: Shouldn't bulk data API be RESTful?

Ed Martin (Jan 29 2018 at 18:15):

For a first time participant at a Connectathon, thanks to all participants, and especially those who provided help to me. It was an eye-opening experience for me, and I look forward to future participation. Please forgive me if this is a newbie question that’s been previously discussed, but after I left the Connectathon, I had this questions..

Why isn’t the proposed Bulk Data API RESTful? If we’re supposed to be following the FHIR API model, couldn’t there be a BulkDataQuery resource that we can POST, GET, PUT and DELETE from the FHIR Server?

Something like this:

BulkDataQuery Resource

Id - FHIR resource id
QueryParameters - e.g. Patient/$export or whatever gets decided
Status - done, not done or perhaps Queued, Completed, InProgress, Cancelled
InitialRequestDateTime
CompletedDateTime
Results - List of URI’s
ResultsExpirationDateTime - the date/time when the results will no longer be available.

POST - initiates a bulk data request, creates a new BulkDataQuery resource on the server
GET - gets the BulkDataQuery resource via id (and support some search parameters)
DELETE - cancels the BulkDataQuery request

Josh Mandel (Jan 30 2018 at 19:41):

This is an interesting suggestion. Overall it introduces some indirection (i.e., rather than asking for an async response to your query, you're talking to an entirely different endpoint) and some more functionality (e.g., it implies that a client should be able to receive some interim results, or search for previously generated exports, etc -- which could either be an benefit or a needless complexity, depending on perspective).

Grahame Grieve (Jan 30 2018 at 19:42):

it overlaps with the Task resource

Josh Mandel (Jan 30 2018 at 19:46):

Really? (I mean, on some level: what doesn't — but I never thought of Task as an infrastructure resource.)

Jenni Syed (Jan 30 2018 at 19:46):

The asynch response pattern we're doing is a Rest approach (if that's the concern). IE: I could GET all Patients where DOB is after 1970 asynchronously. I wonder if it feels more "funky" becuase of the Operation? But then this would be a concern with operations.

Jenni Syed (Jan 30 2018 at 19:49):

If the concern is specifically with the asynch nature, then it wouldn't just be the Bulk Data concern - it would be with all the resources you could select this way.

Josh Mandel (Jan 30 2018 at 19:50):

I think the pattern we're describing generalizes to any call (which came up in discussion here), so it'd be sensible to think of it for non-bulk-export use cases like:

GET /Observation?code=123
Prefer: respond-async

(Or whatever).

This feels more natural to me than:

POST /AsyncJob
content-type: application/json

{
  "query": "Observation?code=123",
  "outptFormat": "application/fhir+ndjson"
}

Josh Mandel (Jan 30 2018 at 19:51):

Now we could say that requests like #1 tie into #2 by returning results like

Content-location: https://server/fhir/AsyncJob/123

... which acts like a FHIR resource. But I'm not sure what problem this solves.

Josh Mandel (Jan 30 2018 at 19:52):

(I wrote AsyncJob here where where @Ed Martin wrote BulkDataQuery -- you should read these as equivalent.)

Grahame Grieve (Jan 30 2018 at 20:01):

a resource would potentially mean that you could find each other's requests etc. That's probably not a desired feature. (though that's not something it has to mean)

Bas van den Heuvel (Jan 30 2018 at 20:17):

Hiding responses from other users is certainly something that we need to support.

Michael Donnelly (Jan 30 2018 at 20:18):

I agree, @Josh Mandel - your GET example is significantly more straightforward.

Grahame Grieve (Jan 30 2018 at 20:20):

my server already does GET that way

Nagesh Bashyam (Jan 31 2018 at 00:01):

I am thinking out loud ...What Use Case would use the Async capability on regular FHIR APIs ? (Publish - Subscribe type use cases ? ) Doesn't async make the backend more complicated to spin off requests , keep track of them and create some kind of a bundle that has to persist for some time and then clean up ...is that really required for Base FHIR use cases ? I can see that complexity being necessary for bulk apis or even for jobs (tasks that are workflow dependent (i.e human intervention) and/or complex, time consuming in nature. (i.e complex queries using some query language or execution of jobs)

Josh Mandel (Jan 31 2018 at 02:15):

I think there's value in saying "here's a consistent way to request data, whether it's in a giant bundle or in an async pile of ndjson files". That said: there's no expectation that most servers would support this async model on "normal" requests; it's only important on requests that require large responses.

Grahame Grieve (Jan 31 2018 at 02:54):

it is more complicated for server and client but that doesn't mean it isn't functional

Vladimir Ignatov (Jan 31 2018 at 18:34):

How about a note in the spec saying that the servers may send a Retry-After header https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After? I think in some cases this can lead to better pooling then using the back-off algorithm.