FHIR Chat · Wire format of XML · implementers

Stream: implementers

Topic: Wire format of XML


view this post on Zulip Elliott Lavy (Jun 17 2016 at 15:12):

I know that FHIR XML must be UTF-8 encoded and that the charset parameter of the MIME-type in the Content-Type header must explicitly be UTF-8. Does this mean that characters over 127 will be double-encoded on the wire, once when building the XML and again when building the HTTP?

view this post on Zulip James Agnew (Jun 17 2016 at 15:49):

Unless I'm misunderstanding, no there shouldn't be any double encoding.

I.e. if you have a FHIR Patient with a name containing a value of <given value="Ä"/>, the Ä gets encoded using its UTF-8 encoding just as any other character would.

view this post on Zulip Elliott Lavy (Jun 17 2016 at 16:18):

Thanks, @James Agnew . But what you posted is not valid XML with encoding="UTF-8". If the value of "given" is "Ä" (0xC4), then the value in the XML would be C3 84. But then with charset="UTF-8" in the HTTP header, it seems that would get encoded again, as C3 83 C2 84.

view this post on Zulip James Agnew (Jun 17 2016 at 18:24):

hmm, is that behaviour documented as a requirement anywhere? i had always just assumed that the XML declaration just needs to match the HTTP one and that there is no double-encoding implied.... and now I'm questioning that, since I don't know why I think that :)

view this post on Zulip Grahame Grieve (Jun 17 2016 at 20:51):

the http spec tells you what the body is. the body is text encoded in UTF-8. There's no need for double encoding

view this post on Zulip Grahame Grieve (Jun 17 2016 at 20:51):

UTF-8 is not text

view this post on Zulip Elliott Lavy (Jun 18 2016 at 03:09):

So the idea is that the XML on the wire is identical to the XML being sent/received, that "charset" merely describes what characters are used to construct the payload without indicating/requiring any action on the part of the sender/receiver?

view this post on Zulip Grahame Grieve (Jun 18 2016 at 03:39):

yes

view this post on Zulip Elliott Lavy (Jun 18 2016 at 05:24):

Thanks for the clarification, @Grahame Grieve

view this post on Zulip Lin Zhang (Dec 03 2016 at 11:33):

Could you please explain what is the wire format? Because it is hard to find a formal definition of the term, eps. for newbies. Thx.

view this post on Zulip Lin Zhang (Dec 03 2016 at 11:33):

Could you please explain what is the wire format? Because it is hard to find a formal definition of the term, eps. for newbies. Thanks.

view this post on Zulip Stefan Lang (Dec 03 2016 at 12:10):

"Wire format" is not a formal (FHIR) term but rather means "how the data goes over the wire", i.e. the byte stream when communicating over the network, for FHIR especially in the context of http(s).

view this post on Zulip Grahame Grieve (Dec 03 2016 at 20:09):

general HL7 term, but not that widely in use. Suggestions for something else?

view this post on Zulip Lin Zhang (Dec 03 2016 at 23:27):

Sorry for duplicate posts due to the status remained "Sending".

view this post on Zulip Marc de Graauw (Dec 06 2016 at 09:46):

@Grahame Grieve Isn't "serialization" the correct technical term for the looser "wire format"? (Or maybe "serialized object", but that's so verbose...)

view this post on Zulip Paul Knapp (Dec 06 2016 at 10:06):

@Marc de Graauw : Yes, but serialization is used as both a verb (process) and noun (result) which makes it confusing. For example, a change to the serialization (process) may or may not result in a change in the serialization (wire format) of a particular instance.

view this post on Zulip Marc de Graauw (Dec 06 2016 at 10:11):

True. So we could use "serialization process" and "serialized object" when the distinction matters, and the looser "Serialization" when it does not. It's still better than "wire format", see Google

view this post on Zulip Lloyd McKenzie (Dec 06 2016 at 14:21):

We could go with "serialization format" instead of "wire format"

view this post on Zulip Josh Mandel (Dec 06 2016 at 14:25):

Yes, or "serialized format". Something like this, I think, would be less jargony than "wire format"

view this post on Zulip Lin Zhang (Dec 06 2016 at 16:03):

More clear. Thanks.

view this post on Zulip John Moehrke (Dec 06 2016 at 16:38):

I like the conclusion... but recognize that serialization is a nested concept, repeated as necessary in different favors... where as wire format is the final format as it is passed to the networking stack. (yes, wire is confusing especially today where wireless is dominant).

view this post on Zulip Grahame Grieve (Dec 06 2016 at 19:25):

I'd be happy to see a task to replace 'wire format' with 'serialization format'

view this post on Zulip Marc de Graauw (Dec 06 2016 at 20:41):

Added item GF12435

view this post on Zulip Grahame Grieve (Dec 06 2016 at 20:43):

thx

view this post on Zulip 亚南 李 (Dec 07 2016 at 01:43):

why not use more common concept -- "bit format" , text format ->encode-> bit format ->decode -> text format

view this post on Zulip Grahame Grieve (Dec 07 2016 at 01:46):

doesn't seen more common to google

view this post on Zulip Lin Zhang (Dec 07 2016 at 02:01):

Great!

view this post on Zulip 亚南 李 (Dec 07 2016 at 02:02):

it will depend what you want to express, if you want to express what is actually passed through "wire" you can use "Ethernet frame", if you want to express what is the physical representation of the information, you should use "bit". text(json,xml) ->utf8 encoding -> bit -> utf8 decoding -> text(json,xml)

view this post on Zulip Marc de Graauw (Dec 07 2016 at 08:05):

I don't think FHIR should occupy itself with anything below XML or JSON or RDF. So here "serialziation format" would mean just those three. If there is a need somewhere to discuss bytes, bits, UTF-8, electrical currents, one could use the appropriate terms there. No need to define those in FHIR.

view this post on Zulip Paul Knapp (Dec 08 2016 at 11:11):

The 'wire format' or 'byte stream' is the physical representation of what is exchanged, it is the logical character stream of xml, json,rdf etc encoded in 8859-1, UCS, UTF-8, UTF-16 etc.. Given that FHIR is both an object representation and an exchange it must as some points discuss the physical representation. Serialized format is a good replacement for wire format.


Last updated: Apr 12 2022 at 19:14 UTC