FHIR Chat · Error Handling Questions

Stream: subscriptions

Topic: Error Handling Questions

adam strickland (Jul 26 2021 at 22:18):

Hi there, I've been reviewing the error handling sections of the backport IG, and one question I have is how servers (publishers) are supposed to know to set the status of a subscription to error.

In both docs we stress how error handling should be simple for the server, but right now we only have the plumbing in place for errors that both clients (subscribers) and servers could know about. In cases where the client has an issue internally that stops them from processing the notification, but otherwise correctly acknowledges the notification to the server, the server has no specified way of re-exposing that notification to the client. $status for instance will only expose notifications that the server knows have failed.

To better allow clients to recover from errors, we propose there should be a $events operation on a Subscription resource (similar to $status) where clients can retrieve ALL events since a given point. An _eventsSinceSubscriptionStart parameter could be used to limit the response to just the most recent events, based off the last eventsSinceSubscriptionStart value the client received. For instance, if a client has not received any notifications since event 150, they could query periodically to ensure they haven't missed 150+ like so:

Subscription/[ID]/$events?_eventsSinceSubscriptionStart=150

This method has the added benefit of enabling efficient polling for clients that don't want to receive notifications. A client could sign up for a subscription without notifications, and just query periodically to see which resources have been updated.

Any thoughts on this $events operation?

Lloyd McKenzie (Jul 26 2021 at 22:50):

@Gino Canessa

Gino Canessa (Jul 27 2021 at 16:25):

adam strickland said:

In both docs we stress how error handling should be simple for the server, but right now we only have the plumbing in place for errors that both clients (subscribers) and servers could know about. In cases where the client has an issue internally that stops them from processing the notification, but otherwise correctly acknowledges the notification to the server, the server has no specified way of re-exposing that notification to the client. $status for instance will only expose notifications that the server knows have failed.

Yes, that is beyond the scope of anything we have defined. If a client acknowledges receipt of a notification, then it is the client's responsibility from there out. This feels like a reasonable boundary.

To better allow clients to recover from errors, we propose there should be a $events operation on a Subscription resource (similar to $status) where clients can retrieve ALL events since a given point. An _eventsSinceSubscriptionStart parameter could be used to limit the response to just the most recent events, based off the last eventsSinceSubscriptionStart value the client received. For instance, if a client has not received any notifications since event 150, they could query periodically to ensure they haven't missed 150+ like so:

Subscription/[ID]/$events?_eventsSinceSubscriptionStart=150

This method has the added benefit of enabling efficient polling for clients that don't want to receive notifications. A client could sign up for a subscription without notifications, and just query periodically to see which resources have been updated.

Any thoughts on this $events operation?

I'm good with defining a way to query events (missed or otherwise), with explicit acknowledgement that it is optional for servers to implement and given that it has plenty of leeway in those implementations. While some servers will store events forever, others may want to cap based on number of events (e.g., the last 10), time (e.g., flushed weekly), some other parameters that are internal to the server (e.g., depending on the user/customer creating the subscription), or not want to store them at all.

I am... uncomfortable? discussing the use case of a polling-only client in R4 right now. I am hesitant to touch polling-based mechanisms since there is already a well-established way to poll a server (RESTful query). I see plenty of design space where this could be useful (a mechanism for clients to create custom change-feeds), but we don't have experience with it and I don't think we can afford to delay the R4 IG publication to get that experience.

Overall, I support the idea of getting an operation in now that has a shape and a low bar for implementation. We can then use the time between R4B publication and R5 to gain more experience and see if there are changes to be made.

If you could file a ticket against the backport IG, since that is a lower bar for additions than R4B, we can (hopefully =) get a vote at the next FHIR-I call and go from there.

Thoughts (Adam or anyone else)?

adam strickland (Jul 27 2021 at 18:28):

Thanks Gino, created a Jira ticket with more details.

adam strickland (Aug 03 2021 at 18:49):

Last week we left off on (among other things) the question of whether $events should be re-triggered (based on current data) or re-sent (based on table at initial notification time). I think this will be related to the adjacent topic on tagging batched/included resources.

My pitch is that the resource/id content should be treated like a re-send (updated, did not mean trigger), but that for brevity we could sort of coalesce all appearances of a resource into a single entry that inherits all of it's tags.

For example, take 2 events separately sent with id-only payloads. Event 1 has PatientA as the focus, and Event 2 has patientA as an included reference. If you call $events asking for events 1-2, we could return a single entry for PatientA that is tagged as the focus of Event 1 and included by Event 2.

This is assuming we go forward with tags like this. Definitely interested to hear about AuditEvent and how that could encapsulate this.

Gino Canessa (Aug 03 2021 at 18:59):

I think that makes sense. I believe I left some additional notes there as I was processing last week, but I as I was working through it I figured it makes sense to also have a parameter on $events that allows a caller to change the payload type. E.g., even though the subscription is normally id-only, when I'm doing a synchronous query, it is inefficient to have the server build out the graph of ids and know that the next request will be to resolve that query. In that case, you can do something like /Subscription/id/$events?...&content=full-resource to hint the server (if the server can/wants to) and return the resources.

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Error Handling Questions · subscriptions