Stream: implementers
Topic: Thumbnails in Attachments
Morten Ernebjerg (Mar 10 2020 at 11:55):
We handle scanned medical documents using DocumentReference
(STU3) and are looking into how to attach thumbnails. At least 3 solutions seem possible: (1) multiple entries in DocumentReference.content
, (2) separate DocumentReference
resources liked by relatesTo
, or (3) a (custom) extension to Attachment
. I found somewhat contradictory advice on this and was hoping from some "interpretive support" :slight_smile:
On the one hand, the ticket FHIR-14532 dealt with a similar case for Media
and rejected adding extra field in favour of solution (1), with the following reasoning:
The functionality of having a thumbnail (smaller rendering) can be supported through DocumentReference.content repetition; where one is a thumbnail and the other is full resolution. (note This could be through a relatesTo 'transforms' relationship, but that seems overkill for this usecase.) The distinction between the .content would be detectable through the DocumentReference.content.attachment metadata elements (e.g. size, height, width, etc).
On the other hand, it seems this might potentially confuse some receiving system into mistaking the thumbnail for the image itself. The reasoning would be same that @Lloyd McKenzie offered in a Zulip thread about multiple pages in DocumentReference
, arguing that separate document pages in separate content
element is problematic:
Using the existing repeating element would not be adviseable because the expectation is that a receiving system would look for the first syntax they recognize or prefer and display that. Some might keep looking and would error out at seeing multiple repetitions with the same mime type. But others would happily just display the first page they encountered.
Furthermore, the spec seems to rule out having a thumbnail with the same mime-type as the original image in content
, which would be an odd constraint if you want to add a thumbnail of, say, a JPEG.
Would be very grateful for any thoughts on this.
Lloyd McKenzie (Mar 10 2020 at 15:01):
Can you submit a change request on this? It seems like something there should be standard guidance on. My personal leaning is an extension.
Bapi Behera (Mar 10 2020 at 15:27):
Do we have a group to discuss in details about the #US Drug Formulary IG?
Lloyd McKenzie (Mar 10 2020 at 15:47):
Grahame Grieve (Mar 10 2020 at 19:11):
@John Moehrke @Eric Haas you moved this resolution
Grahame Grieve (Mar 10 2020 at 19:12):
I don't think that a different DocumentReference is an appropriate solution
Grahame Grieve (Mar 10 2020 at 19:13):
likely extension URL would be http://hl7.org/fhir/StructureDefinition/iso21090-ED-thumbnail (type = Attachment)
Eric Haas (Mar 10 2020 at 22:25):
after reading this thread I think that having an extension for now and depending on usage creation of a 'content.thumbnail' element in the future makes sense. Since you could have multiple contents elements each with it own thumbnail and I don't think the current resolution covers that.
Morten Ernebjerg (Mar 11 2020 at 08:19):
@Lloyd McKenzie Sure, just created FHIR-26556.
John Moehrke (Mar 11 2020 at 16:07):
one reasoning i had for not having a defined thumbnail element, is that there are many different sizes of thumb. Using many .content elements is already available, with the .content.attachment size elements indicating the size of the thumb. If we add a thumbnail element, we will just have to totoally duplicate the existing .content under .thumbnail to support this. I am not understanding why the existing method doesn't work.
Lloyd McKenzie (Mar 11 2020 at 16:30):
The key thing is whether a thumbnail is considered to be a presentation of the 'thing' or a distinct 'thing'. I lean towards the latter. If you send a Word or PDF version of a document, those are different representations of the same thing. Similarly if you send a JPEG or BMP version of an image, that's representing essentially the same thing. But if you send me the 'abstract' of the document or a low-rez thumbnail, that's not really an alternate rendering. It's not intended to be used for the same purpose.
Lloyd McKenzie (Mar 11 2020 at 16:31):
If you have 3 different thumbnails, then send 3 different extension reps.
Gino Canessa (Mar 11 2020 at 16:34):
Not super impacted with this (today), but I don't particularly like having conflicting uses of multiple content
records. Essentially, it means that a system needs to parse out which ones are previews vs different formats, etc. Specifically, content
says:
The document and format referenced. There may be multiple content element repetitions, each with a different format.
To use Lloyd's example, a .doc and .pdf may be equivalent, but the PDF may be a preview with everything rasterized at 50 DPI. May even have both. That means the system will need to compare the sizes on the PDFs and correctly choose the right one for the right situation.
Feels like something that will cause confusion/issues.
Eric Haas (Mar 11 2020 at 17:20):
I am confused. I thought a thumbnail is a " reduced-size versions of pictures or videos" ( thank you
Wikipedia) and basically are a link to the full sized thingy....
e.g.
:rabbit:
(sadly I can't add a link to image in Zulip)
so if you have a single content elements ... say pdf treatise on Easter Bunnies then a separate content element using the above thumbnail could work, but agree with @Gino Canessa is klunky. BUT if you have already multiple contents elements ... say pdf treatise on Easter Bunnies and a doc on easter egg hunts, then I can't image how you can keep all the thumbnails straight without an extension or separate element.
Gino Canessa (Mar 11 2020 at 17:43):
Thumbnails are not links to full-sized representations. Easier than a document (to illustrate) is a photo - to start, lets consider an average 12 megapixel (MP) photo.
In normal use, you may have access to the RAW format of the photo (25 MB), a JPEG (with conservative settings) for 'normal' use (6 MB), and a TIFF version for specific workflows (15 MB). Each of these formats should be included as separate content
elements in the same DocumentReference
, since they are different versions of the same "thing".
Continuing from above, the 6 MB JPEG is too large for something like a navigation tile, so you also want a thumbnail. To make the thumbnail, you take the 6 MB image, downscale it to 2 megapixels and increase JPEG compression to get a nice 200 KB JPEG.
Now you have two JPEG files of the same thing, but they are distinct files with very different qualities.
If you tried to print the 200 KB file as a photo, the quality would suck. If you tried to use the 6 MB photo as a navigation tile, your performance would suck.
In the first case, a system consuming the DocumentReference
will see a RAW, a TIFF, and a JPEG. It would pick the format it wants the most, any one it understood to convert , or whatever. This works since they are all equivalent.
If you now want to add the thumbnail via another content
record, systems will need to figure out that the two JPEG records are a full-size and a thumbnail, based on the sizes of the files. These are NOT equivalent, and cannot be treated as such.
In your example @Eric Haas , the two different documents should be two different DocumentReference
resources, since they are different things.
Eric Haas (Mar 11 2020 at 17:51):
Thanks for the clarification. but in the end @Gino Canessa you agree the thumbnail should be an extension on the thing is thumbnailing, right?
Gino Canessa (Mar 11 2020 at 17:53):
I don't think it should be another record in content
. Whether that means an extension (and where) or a new element, I don't know that I have an opinion.
John Moehrke (Mar 12 2020 at 12:21):
This is confusing. Gino, you explain quite eloquently why each of these different views (pdf vs word, 6MB vs 200 KB) are the same thing. As I read this, it seems you are arguing that these should be just different .content entries in the DocumentReference. Yet, I am also seeing you say you don't want them this way. Can you help me understand what the bright-line is in your logic? To me, these all should be repetitions at the .content level. It is not hard to spin thru the .content entries to find the one that fits your needs.
John Moehrke (Mar 12 2020 at 12:23):
I am not totally against an independent element for thumbnail, but every discussion comes away with no clear definition of what a thumbnail is.
Gino Canessa (Mar 12 2020 at 12:54):
My argument is around my reading/impressions of the docs. I feel that content
entries should be equivalent. Thumbnails are not equivalent to full-scale renderings.
This means that we have two different uses of entries in content
, without a clear differentiator. Relying on systems to dig into the attachment and make their own value judgments feels wrong when we can just label the content properly during creation.
John Moehrke (Mar 12 2020 at 13:10):
I think they are intended to be representations of the full-fidelity. This concept is being used for medical imaging, where the full-fidelity is dicom, while a JPEG is provided for systems that can't understand dicom. The JPEG can be full-fidelity in many cases, like a simple x-ray. Down-scaling is just a transform. Much similar to having a C-CDA as the primary, and having a PDF rendering, a TEXT rendering, and other renderings. These all allow the consumer (app + user) to decide which works best.
John Moehrke (Mar 12 2020 at 13:12):
One can't even declare that ONE of the .content are the 'best' fidelity, as that is dependent on the viewer capability. But I would far prefer to have a flag that indicted which .content is considered prime by the publisher.
John Moehrke (Mar 12 2020 at 13:14):
note other uses of the .content repetition are not-at-all different renderings. There are other use-cases that are wanting (currently doing it today) to use multiple .content for different pages within a document. Mostly done when the document is historic paper, that was scanned. (Note I don't like this, and in the XDS world this would be done with multiple linked DocumentReference.) -- so we are getting good trial-use. I expect more people should weigh in so that we get the most input on consensus.
Lloyd McKenzie (Mar 12 2020 at 14:38):
Some representations are intended to be interchangeable. Whether you display a word document, full PDF or text version of a document doesn't matter. On the other hand, could have a word, PDF or text file that's just a summary and that's not interchangeable with the full text. You need to separate the intended use of 'summary purposes only' vs. 'useable to convey full intended meaning'. Certainly you could have different resolutions of an image available (or even a PDF), but if it's not useable to convey the whole meaning, it shouldn't be sent alongside those that are.
John Moehrke (Mar 12 2020 at 14:41):
so it seems we do have many mutually-exclusive views of what multiple .content elements are useful for. Note that in IHE, multiple are simply forbidden; as the relationship element is more explicit and the metadata of each DocumentReference can explain these different perspectives.
Gino Canessa (Mar 12 2020 at 14:57):
Quick clarification, you are saying that people are using content
entries as pages? As an implementer reading the spec, I would have no clue that was possible, so if I had done a system it would not handle those correctly.
Also, are the pages assumed to be in-order, reverse-order? That sounds terrible.
Gino Canessa (Mar 12 2020 at 14:59):
As of now, I felt the docs were clear:
The document and format referenced. There may be multiple content element repetitions, each with a different format.
This does not say that multiple elements can be different pages, or even two versions of the same format (e.g., a hi-res JPEG and a low-res JPEG). It says that if there's a repetition, the difference is the format.
Gino Canessa (Mar 12 2020 at 15:02):
If that's not the case, then I feel like the docs need to be updated. This would show the issue clearly, e.g.,:
There may be multiple content references, each with a different format. Or, multiple pages. Or, thumbnails. Or, ...
Reading that, it becomes obvious that there's a piece of context missing - a code or something describing the use of the element.
John Moehrke (Mar 12 2020 at 16:03):
yes, that is my understanding.
Morten Ernebjerg (Mar 12 2020 at 16:35):
My 5 implementer cents:
(1) In our use case, thumbnails are simply highly minified versions of an image/document meant to give the user a very rough visual idea of what it is in e.g. a list in an app. They are hence not interchangeable with the full documents, e.g. no text would be readable in a document thumbnail.
(2) Like @Gino Canessa, as an implementer I would really not want to try to write code logic that decides based on e.g. image size whether a given element in content
is the "full version".
(3) Like @Lloyd McKenzie , I would favor a semantic for content
that says that all elements can be used interchangeably for display/other typical usage without the risk of loss of information (e.g. each element would contain the full content for the relevant context). The notion of "typical usage/relevant context" would need probably need some thought, though, since there are (cornerish) use cases, e.g. extraction of structured data from the document files, where it might be hard to say what counts as "interchangeable". Still, in this logic, a thumbnail would clearly not count as interchangeable with the full image (unless the only use case is display of visual summary info :wink: )
John Moehrke (Mar 12 2020 at 16:44):
that ignores the fact that a C-CDA and a PDF of that are also NOT. So, I keep asking for a clear distinction of when is something a transform and when is it not? I think the distinction being made here is not something that is based in logic.
Gino Canessa (Mar 12 2020 at 17:01):
Yes, but anyone processing something that has a C-CDA and a PDF would presumably know that.
To use your DICOM example from earlier (which I'm more comfortable with), a JPEG rendering is typically NOT fully equivalent to a DICOM version. But, if you don't have the ability to parse/display a DICOM file, there is some representation which is 'good enough' for basic use (same resolution, window/leveled to a proper setting, etc.). While not truly equivalent, anyone processing DICOM would understand that.
If I'm processing a PDF, I have no way of inherently knowing that multiple records are different pages, low-res previews, or something else.
Lloyd McKenzie (Mar 12 2020 at 19:45):
A C-CDA that represents the full content and a PDF of the full content are both interchangeable in that they both represent the full content. Obviously the capabilities are going to be different (just as a BMP, JPEG and SVG representation have different capabilities). But a PDF of a summary is not interchangeable with a PDF of the full document. And the author who links the summary knows it's intended to be a summary and is not ever intended to stand in for the full content. Neither syntax nor metadata (unless you try to guesstimate based on raw size) will tell you that a PDF is summary vs. full document. You might have multiple versions of a document in different sizes, but there's still a difference between a 'thumbnail' vs. a "version for use on low-resolution/low-bandwidth devices"
John Moehrke (Mar 16 2022 at 17:42):
Grahame Grieve said:
likely extension URL would be http://hl7.org/fhir/StructureDefinition/iso21090-ED-thumbnail (type = Attachment)
@Grahame Grieve I attempted to apply this CR J#26556. I am not an expert in iso21090, is the extension recommended in the resolution really aligned in some way with iso21090 and thus appropriate to be added to http://build.fhir.org/iso-21090
John Moehrke (Mar 16 2022 at 17:44):
or should it just be a simple boolean extension on DocumentReference.content? that is, not trying to state that it is aligned with 21090.
Grahame Grieve (Mar 16 2022 at 21:08):
I agree with the disposition there. you'd add it to /source/profiles/iso-21090-spreadsheet.xml
Last updated: Apr 12 2022 at 19:14 UTC