Stream: implementers
Topic: SHA1, collisions, and other hashing algorithms in the future
Jeffrey Willis (Oct 03 2017 at 12:29):
The Attachment data type specifies a hash attribute and that the hash of the data must be computed using SHA1. With the announcement this year of SHA1 collisions, has there been any considerations for allowing the hash to be computed via other algorithms (e.g. SHA256)?
The idea being that Attachment.hash would have a companion attribute, say Attachment.hashAlgorithm, that would allow one to specify which algorithm was used to compute the hash.
Lloyd McKenzie (Oct 03 2017 at 19:16):
That sounds like a reasonable thing to submit as a change request. Would you be willing to do that? (In fact, any place that we "fix" a specific algorithm, we probably want to allow for the possibility of future variation.)
Grahame Grieve (Oct 03 2017 at 20:12):
we've rejected this change before - we don't need crypto safe hashing here, and therefore why do we need to ask systems to support more than one simple hashing algorithm
John Moehrke (Oct 03 2017 at 20:38):
In addition to what Grahame said... just picking a different hash does not make the hash value any more useful for non-repudiation. Any case that requires non-repudiation needs to use a Digital Signature structure like what is provided in Provenance.signature.
John Moehrke (Oct 03 2017 at 20:42):
The hash element is there to detect technical failures (e.g., storage rot, transport glitch, incorrect version). All of these are detectable with SHA1, and don't require any more complex hashing algorithm.
John Moehrke (Oct 03 2017 at 20:44):
I would be far more comfortable recommending that hash become an extension, than for us to attempt to create a structure that supports many hashing algorithms. As a pluggable hashing algorithm seems useful, until you recognize the burden on any consuming system. It must implement all possible algorithms, as it never knows what algorithm might have been used. Removing the hash, recognizes that current technology doesn't really need a hash except in specific situations (e.g. when metadata and data are managed on different systems under different SLA terms).
Grahame Grieve (Oct 03 2017 at 20:56):
the point of the hash is that you if you have a reference, rather than data inline, you can check that the data is the same as what was initially referred to by checking the hash of the data you get when you actually access the data.
Richard Townley-O'Neill (Oct 04 2017 at 02:40):
Should the stated requirements change from
Included so that applications can verify that the contents of a location have not changed and so that a signature of the content can implicitly sign the content of an image without having to include the data in the instance or reference the url in the signature.
to leave out or qualify the text about signatures.
Grahame Grieve (Oct 04 2017 at 03:23):
yes that would be a good idea
Richard Townley-O'Neill (Oct 04 2017 at 03:34):
Issue created #13995
John Moehrke (Oct 04 2017 at 12:21):
Good catch. The hash should never have included anything about signature. It is simply an integrity check. A signature requires more than just a good hash algorithm, and is well handled by the Provenance.signature mechanism.
Jeffrey Willis (Oct 05 2017 at 18:07):
Thank you, everyone. You've answered my concerns and I believe continuing to use SHA1 for integrity checks on Attachment will work just fine.
Michael Lawley (May 14 2019 at 02:09):
Rather belatedly bringing this old thread back. Our issue is that we already have a hash (SHA256) associated with the data and we'd just like to re-use it/pass it through rather than computing an additional hash.
Is there a common extension for supporting alternate algorithms?
Last updated: Apr 12 2022 at 19:14 UTC