FHIR Chat · Weak ETags · implementers

Stream: implementers

Topic: Weak ETags


view this post on Zulip Grahame Grieve (Jul 11 2019 at 20:44):

http://community.fhir.org/t/weak-etags-and-the-if-match-header/1420

view this post on Zulip Grahame Grieve (Jul 11 2019 at 20:46):

The relevant words are

An origin server MUST use the strong comparison function when comparing entity-tags for If-Match

view this post on Zulip Grahame Grieve (Jul 11 2019 at 20:47):

I'm not sure what to think about this; if-match can't work for RESTful interfaces that use mime types properly....

view this post on Zulip Grahame Grieve (Jul 11 2019 at 20:47):

we discussed the use of weak and strong e-tags with IETF HTTP Wg, but did not discuss the use of if-match and I think we missed the sentence i quoted above

view this post on Zulip Alexander Kiel (Jul 15 2019 at 11:27):

I'm not sure what to think about this; if-match can't work for RESTful interfaces that use mime types properly....

I don't see why If-Match shouldn't work with RESTful interfaces that use MIME types (content negotiation). One important thing to remember is, that ETags are about the representations and not only about the resource state.

I'm quite new to FHIR but I have a strong background in REST. As I approached FHIR, I always wondered why the term representation isn't used in the spec. The wording is always about exchanging resources, but REST is about exchanging representations of resources. Thats why REST is called REpresentational State Transfer.

So in REST the resource state isn't observable directly. The resource state is only observable through representations. In HTTP - Managing Resource Contention the wording is:

If provided, the value of the ETag SHALL match the value of the version id for the resource.

The version id however only changes with the resource state, not with different representations. In FHIR representations can vary not only by MIME type (JSON, XML, RDF), they can also vary by subsetting through _summary or _elements. At least I would see subsetting as a variance in the representation and not as a different resource. So in my opinion, the version id isn't sufficient for an ETag. We have to put all the variables in the ETag which could lead to different representations.

Also using a weak ETag with only the version id in it isn't very inline with the HTTP spec. In RFC 7232 - Weak versus Strong, the following paragraph can be found:

[...] or a desire of the resource owner to group representations by some self-determined set of equivalency rather than unique sequences of data. An origin server SHOULD change a weak entity-tag whenever it considers prior representations to be unacceptable as a substitute for the current representation. In other words, a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.

Reading this carefully, a XML representation can never be a substitute for say a JSON representation. So weak ETags should used for somehow grouping equivalent representations. Also the example with the weather report in the next paragraph shows this very good. On the other hand, strong ETags are required to change with every representation change.

At the end, the only way I see, that conditional updates using the If-Match header could be implemented inline with REST and the HTTP spec, is to move to strong ETags.

I'm also curious about, what was the reason behind not using the term representation in FHIR?

view this post on Zulip Lloyd McKenzie (Jul 15 2019 at 14:51):

The expectation is that the information conveyed is identical for all representations and it should be possible to read using JSON and then write using XML. That's what creates ETag issues

view this post on Zulip Alexander Kiel (Jul 15 2019 at 15:07):

But if you consider caching, a cache could return a XML representation when a client really asked for a JSON one. Because the ETag would be the same and the cache can’t tell the difference between XML and JSON.

view this post on Zulip Alexander Kiel (Jul 15 2019 at 15:15):

For writes, it’s still possible to use both XML and JSON, because a resource can have multiple ETags, one for each representation.

view this post on Zulip Lloyd McKenzie (Jul 15 2019 at 15:16):

We're treating ETag as being representation-independent. It equates to the 'version' of the resource. A given version can be represented using multiple syntaxes. The syntax doesn't change the meaning of the content.

view this post on Zulip Alexander Kiel (Jul 15 2019 at 16:30):

You are right, that all available representations (syntax as you say) should, or in FHIR even must, convey the same content or let's say state of a resource. And it's also right that the identifier of such a state, the version id, should be sufficient to prevent loosing edits (conditional update).

But, maybe sadly, HTTP doesn't work this way. HTTP considers resources as black boxes and doesn't allow to observe their state directly. Instead HTTP defines Representations that reflect the state of the resource but aren't the state itself. So HTTP has no concept of an internal version id.

Instead the headers ETag and Last-Modified, also called the Validator Header Fields are about the selected representation:

Validator header fields convey metadata about the selected representation.

So according the HTTP spec, the ETag header can't be representation independent. If FHIR want's to use such version id with applied semantics, custom header fields would be the right way to do it. Using the ETag, If-None-Match and If-Match with a resource-independent ETag would break intermediates like caches.

And now the funny part: I just discovered that an old HTTP RFC the RFC 2068 had a Content-Version and a Derived-From header with exactly the semantics FHIR likes to use. But they weren't taken over to RFC 2016 because they weren't widely implemented.

view this post on Zulip Michael Lawley (Jul 15 2019 at 20:01):

We deal with caching and the MIME (and other similar) issues where representation varies outside of URL changes through the Vary header.

view this post on Zulip Grahame Grieve (Jul 15 2019 at 22:34):

I'm also curious about, what was the reason behind not using the term representation in FHIR?

Because it never become a useful way to talk about the issues we have faced in practice. It's certainly true that you only deal with representations, and we don't have any disagreement with that

view this post on Zulip Grahame Grieve (Jul 15 2019 at 22:36):

I didn't follow all you said because you weren't differentiating between week and strong etags. I think that most of your claims were about strong ETags and that's why we don't use them. I did run the logic on this by @Eric Prud'hommeaux and MNot when we were first working on it. We wanted to use strong etags but can't. But I don't see (then or now) why weak etags are wrong

view this post on Zulip Alexander Kiel (Jul 16 2019 at 12:53):

We deal with caching and the MIME (and other similar) issues where representation varies outside of URL changes through the Vary header.

@Michael Lawley Ok that works according to Calculating Secondary Keys with Vary. I have also tested this with nginx as caching reverse proxy.

view this post on Zulip Alexander Kiel (Jul 16 2019 at 13:31):

I didn't follow all you said because you weren't differentiating between week and strong etags. I think that most of your claims were about strong ETags and that's why we don't use them. I did run the logic on this by Eric Prud'hommeaux and MNot when we were first working on it. We wanted to use strong etags but can't. But I don't see (then or now) why weak etags are wrong

@Grahame Grieve Ok with Mark Nottingham and Eric Prud'hommeaux, the right people already looked at the issue and I have to admit, the FHIR API is one of the best REST API's I saw.

I was referring to strong and weak ETags. With weak ETags the origin server claims that representations with the same weak ETag can be considered the same regarding caching:

In other words, a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.

I tested the caching with the following nginx config: https://gist.github.com/alexanderkiel/fbcebc83dfb337929d0420adab726f13

I used Postman with the Accept header application/fhir+json to access a Patient through http://localhost:8081/baseR4/Patient/22783. After that I repeated the request with the Accept header application/fhir+xml and got the JSON representation from the cache, while the cache was revalidating using If-None-Match with the weak ETag and the HAPI server was responding with a 304.

As @Michael Lawley noted, it is right that one can solve the caching problem using the Vary header. In the Gist, I have the configuration of the Vary header commented. So if the origin server sends a vary header of at least accept, the cache can't use the representation with a different Vary header in the first place. So it is not revalidating. Instead it makes a normal GET request to the origin server.

So maybe, if I'm right here, we should recommend that FHIR servers should use the Vary header if they support multiple representations.

Regarding conditional updates (the root issue of this thread) using If-Match. HTTP if very clear about the comparison function:

An origin server MUST use the strong comparison function when comparing entity-tags for If-Match (Section 2.3.2), since the client intends this precondition to prevent the method from being applied if there have been any changes to the representation data.

The strong comparison function never matches on weak ETags. So I really don't see how Managing Resource Contention can work inline with the HTTP spec. But maybe I'm wrong here.

view this post on Zulip Grahame Grieve (Jul 26 2019 at 20:49):

I'll add investigating this to my todo list

view this post on Zulip Michael Lawley (Jul 27 2020 at 09:30):

12 months later - any results of the investigation?
I really want to implement this (If-Match with weak etags), but if there's a better approach I'm all ears (eyes?)

view this post on Zulip Michael Lawley (Jul 27 2020 at 09:31):

@Grahame Grieve

view this post on Zulip Grahame Grieve (Jul 27 2020 at 21:07):

I didn't make any progress with this. I don't know how tot resolve it and be conformant with the HTTP spec; we can't use strong tags but we can't use weak tags either

view this post on Zulip Michael Lawley (Jul 27 2020 at 23:42):

Which, I think, pushes us down the path of custom headers?
I'm not sure what that really buys us other than to-the-letter compliance with the HTTP spec. I think the real question is whether breaking this part of the spec is likely to have consequences when interacting with other pieces of common web infrastructure like proxies, caches, and API gateways.

view this post on Zulip Joe Lamy (Apr 19 2021 at 18:49):

https://jira.hl7.org/browse/FHIR-31925

view this post on Zulip Josh Mandel (Apr 22 2021 at 12:04):

@Paul Church @Caitlin Voegele @James Agnew @Christiaan Knaap @Lee Surprenant i wonder if you could comment on how your servers support If-Match on updates to guard against clobbering content that a client is unaware of. Do you support this use case with weak ETags, and would you prefer FHIR to change to a custom header (for purity of compliance with RFC7233) or leave things as they stand?

view this post on Zulip Lee Surprenant (Apr 22 2021 at 13:41):

Hi Josh. The IBM FHIR Server supports If-Match with weak ETags set to the resource versionId as outlined in the spec. We’d prefer not to change this, but could possibly be convinced otherwise.

view this post on Zulip Eric Prud'hommeaux (Apr 22 2021 at 13:56):

(late to the party) What are the issues with strong and week ETag conformance?

view this post on Zulip Josh Mandel (Apr 22 2021 at 14:38):

FHIR-31925 has a succinct distillation thanks to @Joe Lamy; quoting here:

The current specification defines usage of the HTTP If-Match header to manage resource contention on updates. However, as noted in this chat, this usage is in conflict with the underlying RFC 7232, section 3.1. Simply stated, if servers obey the RFC, then conditional update using an ETag and If-Match should never work because it is a weak ETag.

If servers are successfully making use of If-Match, and we are ok with a processing model that violates the RFC, then section 3.1.0.5 should call out the issue explicitly and specify the modified processing model.

view this post on Zulip Paul Church (Apr 22 2021 at 17:46):

Google supports If-Match with weak ETags as per the spec. This functionality is very useful and I'd like to expand it to PATCH/DELETE, and also If-Match: * and If-None-Match: * on methods where those make sense.

Because there is no obvious way to change it to align with the RFC, and it is useful, I think we should keep it. Strict conformance with the HTTP spec would be purity at the expense of functionality.

Maybe we should ask the IETF to create "medium ETags" and migrate to those.

view this post on Zulip Caitlin Voegele (Apr 22 2021 at 21:09):

Adding @Brendan Kowitz

We do support it. Every time you create, update or get a resource it returns an e-tag. After this if the client wanted to perform an update and ensure the update will only apply to the current state of the database they can pass back the If-Match header with the etag they have when fetching. If someone else has already updated the resource the etag will be different and the second update will return a conflict.

view this post on Zulip Brendan Kowitz (Apr 22 2021 at 22:53):

Agree, I think this is meeting the use-cases I've seen and looks to have consistent behavior across servers. I could also be convinced to change if there was a more compelling solution, but I don't see it as urgent either

view this post on Zulip Christiaan Knaap (Apr 26 2021 at 11:09):

I agree with the comments above: We support it in Firely Server as specified in the FHIR Spec. Changing the behaviour would not be a priority to us.

view this post on Zulip Lloyd McKenzie (Apr 26 2021 at 17:25):

Perhaps the appropriate outcome then is to explicitly document in the standard our awareness that "yes, we're not technically conformant with the w3c spec, but we don't care because the functionality needed can't be achieved using a technically conformant mechanism".

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:33):

I spoke with Yves Lafon and we came to the conclusion that FHIR can appear to use strong ETags.

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:35):

specifically, FHIR can define a structured ETag (reminiscent of https://datatracker.ietf.org/doc/html/draft-ietf-http-negotiation-00#section-9.2) which captures the version number and the specific representation.

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:37):

GET /Obs/obs1 Accept: application/json -> ETag: "v123;json"
PUT /Obs/obs1 C-Type: text/turtle, If-Match: "v123;json" -> 204 ETag: "v124;ttl"
PUT /Obs/obs1 C-Type: application/json, If-Match: "v123;json" -> 412

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:41):

That 1st PUT worked because the server split the ETag on the ';', got the version number, and saw that it was current, and updated the Resource.
The 2nd failed 'cause the version number was no-longer current.

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:41):

From the perspective of caches, the ETag looks like any opaque, strong ETag (I like my ETags strong and opaque)

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 10:47):

Adding a Vary header on ETag will allow a proxy to honor byte-range requests because "v123;json" and "v1234;ttl" both map to individual cache entries (i.e. docs that it's seen go by and has indexed by URL and ETag).

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 13:04):

with this structure ETag, any cache will see the Vary on ETag and the opaque value of the ETag and consider it a cache miss (i.e. distinct resource).

view this post on Zulip Lloyd McKenzie (Jun 02 2021 at 13:40):

Can you expand on byte-range requests? There's no expectation that the expressed XML and JSON are canonicalized. Is there a risk if the serialization changes at all even if formally the data in the resource is identical? E.g. if someone sets a pretty-print flag on a query, but asks for a byte range on a read?

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 13:50):

any change to the data would give you a new ETag.

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 13:57):

GET /Obs/obs1 Accept: application/json -> 200 ETag: "v123;json" Payload
GET /Obs/obs1 Accept: application/json, If-None-Match: "v123;json" -> 304
PUT /Obs/obs1 C-Type: text/turtle, If-Match: "v123;json" -> 204 ETag: "v124;ttl"
GET /Obs/obs1 Accept: application/json, If-None-Match: "v123;json" -> 200 ETag: "v124;ttl" Payload

view this post on Zulip Josh Mandel (Jun 02 2021 at 14:10):

In Lloyd's example he's pointing out that

GET /Patient/1?pretty=true&_format=json
GET /Patient/1&format=json

Needs to return different strong ETags. Which seems fine. All query parameters or headers that impact serialization need to be represented if you want strong ETags. (The details would be implementation specific and we wouldn't need to standardize them, as long as they worked functionally.)

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 14:58):

good point, but i think that just informs what goes into the second part of our structured ETag

view this post on Zulip Eric Prud'hommeaux (Jun 02 2021 at 15:02):

basically, the 2nd part (not that the order matters, but i'm inheriting the assumption of the semantic segment being the first part from that ref'd RFC) can be exactly what we'd use for an ETag today; we just preface it with a "version" identifier (and throw ""s around it to not mislead the header parser)

view this post on Zulip Josh Mandel (Jun 02 2021 at 16:27):

Just so.

view this post on Zulip Lloyd McKenzie (Jun 02 2021 at 17:27):

@Grahame Grieve - is this something we can get away with this far into normative?

view this post on Zulip Josh Mandel (Jun 02 2021 at 18:04):

I think the main question is what it would buy us; it adds complexity to make clients understand different classes of ETags that a server might return, vs today saying "The Version Id is represented in the ETag header".

view this post on Zulip Lloyd McKenzie (Jun 02 2021 at 21:17):

I think the primary benefit would be that systems could make caching choices based on strong ETags. I also thought there was an issue that we weren't strictly conformant with our use of weak e-tags.

view this post on Zulip Grahame Grieve (Jun 03 2021 at 00:55):

I think that this is a case of missed the boat. The real concern for me is the behavior of proxies and caches. I don't mind if we're diverging on the server (would prefer not to, but we're there now). But if w're breaking middleware, that's a bigger deal. I don't think we are...

view this post on Zulip Eric Prud'hommeaux (Jun 03 2021 at 16:49):

given that ETags are opaque, proxies and caches should never notice that you have useful structure in there. the only party that needs to look into the structure is the server.

view this post on Zulip Eric Prud'hommeaux (Jun 03 2021 at 16:52):

Even the client doens't care. It gets an ETag and uses that ETag in an If-Match. It's just the server that knows that it can equate "v123;json;pretty" with "v123;ttl;fugly" without losing updates.

view this post on Zulip Eric Prud'hommeaux (Jun 03 2021 at 16:53):

even byte-ranges work if the representation of "v123;json;pretty" is consistent

view this post on Zulip Eric Prud'hommeaux (Jun 03 2021 at 16:55):

I also don't know how you can safely enable cross-format updates without some sort of semantic (component in the) ETag.

view this post on Zulip Lloyd McKenzie (Jun 03 2021 at 18:00):

Right now we say we use weak etags Are intermediaries going to cache or take any other action based on those?

view this post on Zulip Eric Prud'hommeaux (Jun 04 2021 at 08:31):

i think the question is what client/proxy behavior will change if we formalize the ETags (include version and representation flags) and remove the "W/" from them?

view this post on Zulip Brittney Wolf (Jun 04 2021 at 08:41):

https://recruiting.paylocity.com/recruiting/jobs/Details/572675/DEXCARE-INC/Sr-Developer-Epic-Integration-Engineer

view this post on Zulip Eric Prud'hommeaux (Jun 04 2021 at 08:51):

the answer should lie in https://datatracker.ietf.org/doc/html/rfc7232#section-2.1

Strong validators are usable for all conditional requests, including
cache validation, partial content ranges, and "lost update"
avoidance. Weak validators are only usable when the client does not
require exact equality with previously obtained representation data,
such as when validating a cache entry or limiting a web traversal to
recent changes.

view this post on Zulip Eric Prud'hommeaux (Jun 04 2021 at 09:01):

so both are good for cache validation (I imagine the anthropomorphized cache saying "I hope you know what you're doing" when working with weak ETags). Strong validation adds byte range and If-Match.

view this post on Zulip Josh Mandel (Jun 04 2021 at 13:41):

We're "getting" lost update avoidance (If-Match) with weak ETags today, in a fashion that perhaps is just cheating but appears to be working ;-)

view this post on Zulip Josh Mandel (Jun 04 2021 at 13:45):

I think Eric is showing us what the straight and narrow path looks like.


Last updated: Apr 12 2022 at 19:14 UTC