Stream: implementers
Topic: Character encoding of base64binary fields
Jorge de la Garza (Aug 18 2017 at 17:22):
Suppose my programming platform represents non-ASCII characters internally with their Unicode values, for example:
"た" (HIRAGANA LETTER TA) = 0x305F
So to put this in a field of type base64binary, I first have to base64-encode it, but before I can do that, I have to convert it somehow from a double-wide byte to a series of single-width bytes. UTF-8 encoding comes to mind, in which case the value becomes:
0xE3 0x81 0x9F
This base64-encodes to:
44Gf
So that is what would go in the field value. So my question this, if I then send this resource to another system, how is that system supposed to "know" that once the value is base-64 decoded, that it is still UTF-8 encoded? Should UTF-8 encoding of the decoded value always be assumed?
Lloyd McKenzie (Aug 18 2017 at 17:38):
First, you'd look the mime type to figure out what kind of file you're dealing with. If the content is a text file, you can use a BOM to indicate to the text processor that it's dealing with UTF. If your content is in some other encoding, you'd have to use whatever conventions (if any) are in place for that type of character encoding. And of course, if someone sends you UTF without a BOM, you have all the same fun that exists today when someone sends an email attachment with text that's in an unexpected encoding. FHIR doesn't fix that problem. (Which is why PDF or other syntaxes that don't have the encoding issue will often be easier to use.)
Jorge de la Garza (Aug 18 2017 at 17:46):
First, you'd look at the mime type
The mime type of what? This is in the context of a resource, for example, AuditEvent.entity.query.
Lloyd McKenzie (Aug 18 2017 at 18:10):
I was assuming you were referring to Binary or the Attachment data type. I agree that AuditEvent.event.query seems under-specified. It's not clear how this content should be parsed when un-encoded. Can you submit a change request for that to be clarified? (Or alternatively to change the data type to Attachment which would allow conveying the mime type.
Jorge de la Garza (Aug 21 2017 at 16:36):
Submitted: https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=13759&start=0
Thanks Lloyd.
John Moehrke (Aug 23 2017 at 13:03):
@Jorge de la Garza The meaning of the AuditEvent.entity.query and AuditEvent.entity.detail encoding is specific to the AuditEvent.type, AuditEvent.subtype, AuditEvent.entity.type, and AuditEvent.entity.role. This is just like in DICOM. We will improve the element definitions using your GF#13759. The base64 is intended only to be a general-use and safe container for event specific data blobs.
Last updated: Apr 12 2022 at 19:14 UTC