Stream: V2
Topic: hexadecimal data
Jake Aitchison (Mar 01 2021 at 19:51):
depending on where one looks escaping of 2 specific hexadecimal characters is done differently.
the 2 I am referring to are line feeds \n
or \X0A\
or \X00A\
in hexadecimal and carriage return \r
or \X0D\
or \X00D\
or \X000d\
.
what is the correct way to decode/encode these characters?
https://github.com/Efferent-Health/HL7-dotnetcore/blob/9ca6f4b907960e062f35163283708ff2b6ceb1a7/src/Encoding.cs#L94 uses \X00D\
but https://github.com/hapifhir/hapi-hl7v2/blob/efca803e58aea1705ba6b28390ad24c4c3b1d3cb/hapi-base/src/main/java/ca/uhn/hl7v2/parser/DefaultEscaping.java#L286 uses \X000d\
and InterSystems (who make the ensemble integration engine) recommend \X0D\
https://docs.intersystems.com/latest/csp/docbook/Doc.View.cls?KEY=EHL72_escape_sequences.
Craig Newman (Mar 01 2021 at 21:50):
Per @Anthony(Tony) Julian from the InM Work Group:
When exploring the escape sequences it is best to go to the source: Chapter 2.
2.7.6 Hexadecimal
When the hexadecimal escape sequence (\Xdddd...\) is used the X SHALL be followed by 1 or more pairs of hexadecimal digits (0, 1, . . . , 9, A, . . . , F). Consecutive pairs of the hexadecimal digits represent 8-bit binary values. The interpretation of the data is entirely left to an agreement between the sending and receiving applications that is beyond the scope of this Standard.
\Xdddd\ the X shall be followed by 1 or more pairs of hexadecimal digits. Which means \X0a\ is legal, as is \X000a\. The last sentence is there because the implementation changes based on the character set(s) used as well as the contract between the trading partners.
Vassil Peytchev (Mar 02 2021 at 04:55):
The quote from chapter 2 is
When the hexadecimal escape sequence (\Xdddd...\) is used the X SHALL be followed by 1 or more
pairs of hexadecimal digits (0, 1, . . . , 9, A, . . . , F). Consecutive pairs of the hexadecimal digits
represent 8-bit binary values. The interpretation of the data is entirely left to an agreement between the
sending and receiving applications that is beyond the scope of this Standard.
This means that the sequence \X000A\ is valid, but that represents a sequence of two bytes (aka octets). Unless there is a predefined agreement for the use of UTF-16 as the character encoding, line feed and carriage return are single byte values, and \X0A\ and \X0D\ are the hexadecimal representations of these characters.
Note that HL7 V2 has only described the use of UTF-8 Unicode encodings, so a theoretical use of UTF-16 or UTF-32 is probably non-compliant. See the definition of MSH-18 (section 2.14.9.18) and table 0211.
Grahame Grieve (Mar 02 2021 at 05:19):
I'm not sure that this is correct. I think that defaulting to UTF-8 is a reasonable, but there's lots of messages out there that don't use UTF-8 - they are using some kind of old OEM character set when using \X
Frank Oemig (Mar 02 2021 at 06:07):
Originally we have had 7bit Ascii only. But a lot used 8bit or iso8859.
We examined the use of Unicode which is also ok, but only as UTF-8 because it is sufficient and compatible. It is documented in the old wiki I believe to remember.
Jake Aitchison (Mar 02 2021 at 07:30):
Thanks so much guys for these replies, so would you recommend \X0D\
and \X0A\
respectively?
Vassil Peytchev (Mar 02 2021 at 12:50):
I was making the distinction between UTF-8 and other Unicode encodings. Other non-Unicode encodings are allowed, as per table 0211, but for all of them, as far as I can tell, \n and \r are single byte values, and their hexadecimal representations are \X0A\ and \X0D\
Elliot Silver (Mar 02 2021 at 19:21):
By the way, if the field is formatted text (FT, which I suspect it is, if you are encoding line breaks in it) you may be able to avoid any encoding of newline or return, by using the \.br\
formatting command.
Frank Oemig (Mar 02 2021 at 20:40):
@Vassil Peytchev : AFAIR, UTF-16 and UTF-32 is not allowed, because you cannot parse the MSH segment. And since they are equivalent it is not necessary to use those.
Last updated: Apr 12 2022 at 19:14 UTC