Stream: implementers
Topic: Wire format of XML
Elliott Lavy (Jun 17 2016 at 15:12):
I know that FHIR XML must be UTF-8 encoded and that the charset parameter of the MIME-type in the Content-Type header must explicitly be UTF-8. Does this mean that characters over 127 will be double-encoded on the wire, once when building the XML and again when building the HTTP?
James Agnew (Jun 17 2016 at 15:49):
Unless I'm misunderstanding, no there shouldn't be any double encoding.
I.e. if you have a FHIR Patient with a name containing a value of <given value="Ä"/>, the Ä gets encoded using its UTF-8 encoding just as any other character would.
Elliott Lavy (Jun 17 2016 at 16:18):
Thanks, @James Agnew . But what you posted is not valid XML with encoding="UTF-8". If the value of "given" is "Ä" (0xC4), then the value in the XML would be C3 84. But then with charset="UTF-8" in the HTTP header, it seems that would get encoded again, as C3 83 C2 84.
James Agnew (Jun 17 2016 at 18:24):
hmm, is that behaviour documented as a requirement anywhere? i had always just assumed that the XML declaration just needs to match the HTTP one and that there is no double-encoding implied.... and now I'm questioning that, since I don't know why I think that :)
Grahame Grieve (Jun 17 2016 at 20:51):
the http spec tells you what the body is. the body is text encoded in UTF-8. There's no need for double encoding
Grahame Grieve (Jun 17 2016 at 20:51):
UTF-8 is not text
Elliott Lavy (Jun 18 2016 at 03:09):
So the idea is that the XML on the wire is identical to the XML being sent/received, that "charset" merely describes what characters are used to construct the payload without indicating/requiring any action on the part of the sender/receiver?
Grahame Grieve (Jun 18 2016 at 03:39):
yes
Elliott Lavy (Jun 18 2016 at 05:24):
Thanks for the clarification, @Grahame Grieve
Lin Zhang (Dec 03 2016 at 11:33):
Could you please explain what is the wire format? Because it is hard to find a formal definition of the term, eps. for newbies. Thx.
Lin Zhang (Dec 03 2016 at 11:33):
Could you please explain what is the wire format? Because it is hard to find a formal definition of the term, eps. for newbies. Thanks.
Stefan Lang (Dec 03 2016 at 12:10):
"Wire format" is not a formal (FHIR) term but rather means "how the data goes over the wire", i.e. the byte stream when communicating over the network, for FHIR especially in the context of http(s).
Grahame Grieve (Dec 03 2016 at 20:09):
general HL7 term, but not that widely in use. Suggestions for something else?
Lin Zhang (Dec 03 2016 at 23:27):
Sorry for duplicate posts due to the status remained "Sending".
Marc de Graauw (Dec 06 2016 at 09:46):
@Grahame Grieve Isn't "serialization" the correct technical term for the looser "wire format"? (Or maybe "serialized object", but that's so verbose...)
Paul Knapp (Dec 06 2016 at 10:06):
@Marc de Graauw : Yes, but serialization is used as both a verb (process) and noun (result) which makes it confusing. For example, a change to the serialization (process) may or may not result in a change in the serialization (wire format) of a particular instance.
Marc de Graauw (Dec 06 2016 at 10:11):
True. So we could use "serialization process" and "serialized object" when the distinction matters, and the looser "Serialization" when it does not. It's still better than "wire format", see Google
Lloyd McKenzie (Dec 06 2016 at 14:21):
We could go with "serialization format" instead of "wire format"
Josh Mandel (Dec 06 2016 at 14:25):
Yes, or "serialized format". Something like this, I think, would be less jargony than "wire format"
Lin Zhang (Dec 06 2016 at 16:03):
More clear. Thanks.
John Moehrke (Dec 06 2016 at 16:38):
I like the conclusion... but recognize that serialization is a nested concept, repeated as necessary in different favors... where as wire format is the final format as it is passed to the networking stack. (yes, wire is confusing especially today where wireless is dominant).
Grahame Grieve (Dec 06 2016 at 19:25):
I'd be happy to see a task to replace 'wire format' with 'serialization format'
Marc de Graauw (Dec 06 2016 at 20:41):
Added item GF12435
Grahame Grieve (Dec 06 2016 at 20:43):
thx
亚南 李 (Dec 07 2016 at 01:43):
why not use more common concept -- "bit format" , text format ->encode-> bit format ->decode -> text format
Grahame Grieve (Dec 07 2016 at 01:46):
doesn't seen more common to google
Lin Zhang (Dec 07 2016 at 02:01):
Great!
亚南 李 (Dec 07 2016 at 02:02):
it will depend what you want to express, if you want to express what is actually passed through "wire" you can use "Ethernet frame", if you want to express what is the physical representation of the information, you should use "bit". text(json,xml) ->utf8 encoding -> bit -> utf8 decoding -> text(json,xml)
Marc de Graauw (Dec 07 2016 at 08:05):
I don't think FHIR should occupy itself with anything below XML or JSON or RDF. So here "serialziation format" would mean just those three. If there is a need somewhere to discuss bytes, bits, UTF-8, electrical currents, one could use the appropriate terms there. No need to define those in FHIR.
Paul Knapp (Dec 08 2016 at 11:11):
The 'wire format' or 'byte stream' is the physical representation of what is exchanged, it is the logical character stream of xml, json,rdf etc encoded in 8859-1, UCS, UTF-8, UTF-16 etc.. Given that FHIR is both an object representation and an exchange it must as some points discuss the physical representation. Serialized format is a good replacement for wire format.
Last updated: Apr 12 2022 at 19:14 UTC