FHIR Chat · Binary Hash and Size · implementers

Stream: implementers

Topic: Binary Hash and Size


view this post on Zulip Aaron (Jul 25 2018 at 18:15):

I’m working on a solution that involves the need to be able to update/version/replace existing Binary resources. Initially, it seems like a PUT against Binary would be the obvious choice. However, we also have a desire to perform a quick integrity check via hash and size. Unfortunately, hash and size are defined within the Attachment element.

Rather than build a solutions where clients attempt to request a PATCH or PUT to the related DocumentReference, Communication, Patient, etc. resource, it seemed ideal to just be able to request to version the Binary object directly whenever applicable.

Since Binary does not extend DomainResource, it can’t have extensions added to it. However, I suppose it would be possible to extend Binary.meta so that a custom representation of hash and size are included there.

Similarly, it'd be nice to be able to include hash and size in the response headers when streaming a binary as a result of a GET request from a client. This would allow for clients to validate the object without also having to request the related resource/attachment as well.

Has anybody else had a similar issue in dealing with attachments and the absence of hash and size metadata on the Binary resource? Has the question of adding hash and size to Binary been raised as a change proposal to be specification before?

view this post on Zulip Lloyd McKenzie (Jul 25 2018 at 18:24):

It's not something that's been asked previously. Binary is unique in that it's commonly shared as a raw binary instance rather than a FHIR XML or JSON resource with base64-encoded data. As a result, we're pretty cautious about adding "extra" stuff to Binary. (Most of what's there now is expressible using standard headers.) An alternative possibility would be to define an Operation that returns the hash of the Binary. Then you could do GET [base]/Binary/[id]$hash to return the hash for the current version of the binary. This might actually be a use-case where invoking an operation on a historical version of a resource could also be useful. E.g. GET [base]/Binary/[id]/_history/[version]$hash

view this post on Zulip Aaron (Jul 25 2018 at 18:52):

Hi Lloyd. We also share the concern about dealing with a GET of the raw binary as we intend to steam just the raw bytes rather than doing an XML or JSON resource with the base64-encoded data.

That being said, it appears that content-length could be mapped for size but I don't have any obvious answers for hash when reviewing the standard HTTP headers. (There is a deprecated content-md5 which is close but not quite right.) FHIR could always choose to add a non-standard and optional header to the relevant request and responses so we can continue to avoid adding anything that must be included in the body of a request.

We do have two use cases here:
1. Server wishes to validate bytes received during a direct create/update to binary so it would expect hash + size in a POST/PUT request.
2. Client wishes to validate the bytes returned during the retrieval so it would need hash + size in an response to a GET.

Of course, your recommendation works but would add another round trip to the process. This is fine as long as we can assume we're in a low-latency environment.

view this post on Zulip Michele Mottini (Jul 25 2018 at 19:18):

It is a custom feature, so you can just define and use your own custom header to send / return the hash

view this post on Zulip Lloyd McKenzie (Jul 25 2018 at 19:27):

The question is whether we should make it a standard feature - and if so, how we'd like it to work

view this post on Zulip Grahame Grieve (Jul 25 2018 at 20:14):

I don't understand what you are validating?

view this post on Zulip John Moehrke (Jul 25 2018 at 20:22):

The reason for hash and size to be in Attachment is because when the Binary (or other bit bucket) holds the actual bits, one MAY want to confirm that what they retrieved from the Binary is what the Attachment was intending them to retrieve. It is not a security mechanism, it is an continuity-integrity mechanism (hence why it is simple SHA1). To move these into the Binary would break this purpose. If you want to be sure you got the bits that the binary URL is referring to, then use HTTPS; as https includes authenticity and integrity mechanisms (aka TLS). If TLS is happy, then you got what is there. If you are happy just knowing you got the bits from that location, then don't bother with hash and size in Attachment. They are optional to populate, and they are optional to use.

view this post on Zulip John Moehrke (Jul 25 2018 at 20:23):

(similar, but weaker, argument can be made for the use of hash and size when the content is base64 encoded in attachment.data; except this is just confirming that you have successfully been able to un-base64.)

view this post on Zulip Elliot Silver (Jul 25 2018 at 20:50):

Binary can return the content in different media types, so size/hash embedded in the resource would have to be media type specific.

view this post on Zulip Aaron (Jul 27 2018 at 14:18):

@Grahame Grieve : We were looking to perform an integrity check. However, there seemed to be some confusion on the specification's intention behind hash/size. After reading this thread, it seems as if we are always simply trusting lower layer protocols to correctly transfer the object between client/server. I was simply incorrectly thinking that hash/size existed to ensure that the bytes received were actually the bytes expected. Put differently, I was under the impression that hash/size were used as something akin to an application-layer checksum.

Part of the reason I was under the impression that FHIR used hash as a checksum is that IHE-XDS uses hash explicitly for this reason.

4.2.3.2.10 DocumentEntry.hash
Description: The hash of the contents of the document.

The hash attribute can be used to identify accidental document corruption, mistaken duplicate
IDs, etc. The SHA1 algorithm and hash attribute should not be used for identifying malicious alterations.

@John Moehrke Moving hash/size into Binary does not defeat any ability to perform an integrity check against the data. The binary's content is the thing that we are testing the integrity of when we include it in the Attachment. Binary already has metadata (i.e. contentType), why is additional metadata about the data being considered as a breaking change? I'm afraid I am not following your rationale here.

view this post on Zulip Lloyd McKenzie (Jul 27 2018 at 14:21):

The hash on attachment is used to verify the binary pointed to by the URL hasn't changed since the Attachment instance was created. That's about checking integrity of content that can evolve independently. We don't have anything to check that a transfer completed successfully - that's presumed to be managed at the HTTP layer.

view this post on Zulip Aaron (Jul 27 2018 at 14:35):

@Lloyd McKenzie thanks for the clarification. That is what I suspected. As an aside, shouldn't versioning be used to verify that the binary hasn't changed over time? Also, who's to say that the server didn't change the binary but fail to update the hash on the Attachment?

view this post on Zulip John Moehrke (Jul 27 2018 at 14:35):

@Aaron can you elaborate on what risk you are trying to prevent? There are other solutions in the FHIR spec, and in the infrastructure upon which we build FHIR. You could use a Bundle.signature. You could use a Provenance.signature. You could add an extension.... There are many choices. Each of these have a specific reason, and each has some drawback. It is best to align your need with he best tool. The hash/size in Attachment is inspired by the use-case that drove it into XDS, or viewed a different way it is in Attachment because it has been found useful in XDS when the Attachment and the body of the url are manintained in different services.

view this post on Zulip John Moehrke (Jul 27 2018 at 14:36):

Attachment does not require that the URL points to a Binary... it might, it might not.. it is just a url.

view this post on Zulip John Moehrke (Jul 27 2018 at 14:36):

If the content changes without an update to Attachment, then that sounds like an integrity failure

view this post on Zulip John Moehrke (Jul 27 2018 at 14:36):

If that is expected, then I would leave hash/size empty as an indication that no checking can be done.

view this post on Zulip Aaron (Jul 27 2018 at 14:37):

@John Moehrke that would be an integrity failure not caught by the integrity checking system you have created.

view this post on Zulip Lloyd McKenzie (Jul 27 2018 at 14:39):

Versioning can work if you're pointing to a Binary instance that supports versioning. But that support isn't required for FHIR systems and there's no expectation that Attachment will always point to FHIR-hosted data.

view this post on Zulip Aaron (Jul 27 2018 at 14:39):

Just to be clear, we've solved our issue. We wanted a way to perform an application-layer check that the content received is exactly what we expected. This is a redundant check on top of what TCP and layer 2 (ex: ethernet) protocols provide but it is one that was wanted none-the-less.

Essentially, I was confused by the use cases for hash and size. I think it would be excellent to add the clarity that was provided in this thread to the specification itself.

view this post on Zulip Lloyd McKenzie (Jul 27 2018 at 14:40):

If the server changes the Binary, then it's no longer what the user saw when they created the Attachment and the hash's won't match. How that situation is handled will be up to the system or user who's looking at the data.

view this post on Zulip John Moehrke (Jul 27 2018 at 14:49):

Do you realize that when using https (tls) there is cryptographicly assured integrity and authenticity controls? How is that not sufficient? I am concerned you are adding application level checks without a risk that they address

view this post on Zulip Aaron (Jul 27 2018 at 15:01):

Excellent point. Thank you for bringing it up.

I think our customer wants to ensure that the client applications can perform a hash then export the data that they just hashed without failure. For now, we've just added a server-side option in our product to say "if you receive a hash in an attachment, check that it matches the hash that we computed for the bytes received and reject when there is a mismatch". If somebody wants to spend the time performing that check then they are welcome to do so.

Of course, if we are relying on this check to actually check a problem then we have a significant issue somewhere in the process of sending content to our repository.

view this post on Zulip John Moehrke (Jul 27 2018 at 15:03):

okay, so you are using Attachment and hash/size as defined.

view this post on Zulip Aaron (Jul 27 2018 at 15:16):

Correct. We wanted to avoid the overhead of having to include the Attachment in a request to update an existing Binary. Again, this comes back to the requested usage of hash/size as a checksum. If we could drop that requirement, we'd just update the Binary directly without needing to include an attachment in the request.

view this post on Zulip John Moehrke (Jul 27 2018 at 15:19):

then use Attachment with size and hash empty


Last updated: Apr 12 2022 at 19:14 UTC