Stream: dotnet
Topic: Serialization
Josh Lamb (Sep 08 2020 at 12:57):
[deleted]
Brian Postlethwaite (Apr 13 2021 at 10:55):
Hey all, some of us are working on several Serialization Implementations and considering what a sensible way of making this capability switchable.
This is one of my thoughts to have an interface, along with a default implementation.
public interface IFhirXmlSerialization : IFhirSerialization
{
}
public interface IFhirJsonSerialization : IFhirSerialization
{
}
public interface IFhirSerialization
{
Task Serialize(Stream stream, Base resource, SummaryType summary = SummaryType.False, string[] elements = null);
Task<T> Deserialize<T>(Stream stream, OperationOutcome parsingResults) where T : Base;
}
public class DefaultFhirXmlSerialization : IFhirXmlSerialization
{
public Task<T> Deserialize<T>(Stream stream, OperationOutcome parsingResults)
where T : Base
{
using (XmlReader reader = Utility.SerializationUtil.XmlReaderFromStream(stream))
{
return Task.FromResult<T>(new Hl7.Fhir.Serialization.FhirXmlParser().Parse<T>(reader));
}
}
public Task Serialize(Stream stream, Base resource, SummaryType summary = SummaryType.False, string[] elements = null)
{
var serializer = new Hl7.Fhir.Serialization.FhirXmlSerializer();
using (var writer = new XmlTextWriter(stream, System.Text.Encoding.UTF8))
{
serializer.Serialize(resource, writer, summary, elements: elements);
return Task.CompletedTask;
}
}
}
public class DefaultFhirJsonSerialization : IFhirJsonSerialization
{
public Task<T> Deserialize<T>(Stream stream, OperationOutcome parsingResults)
where T : Base
{
using (JsonReader reader = Utility.SerializationUtil.JsonReaderFromStream(stream))
{
return Task.FromResult<T>(new Hl7.Fhir.Serialization.FhirJsonParser().Parse<T>(reader));
}
}
public Task Serialize(Stream stream, Base resource, SummaryType summary = SummaryType.False, string[] elements = null)
{
var serializer = new Hl7.Fhir.Serialization.FhirJsonSerializer();
using (StreamWriter sw = new StreamWriter(stream))
{
using (JsonWriter writer = Utility.SerializationUtil.CreateJsonTextWriter(sw))
{
serializer.Serialize(resource, writer, summary, elements);
return Task.CompletedTask;
}
}
}
}
Brian Postlethwaite (Apr 13 2021 at 10:56):
Then should this be a singleton?
How can DI work around the place?
With the FhirClient?
Brian Postlethwaite (Apr 13 2021 at 10:56):
@Gino Canessa @Marco Visser
Brian Postlethwaite (Apr 13 2021 at 10:58):
And should we include the ability to have them run as async over the content?
Or have a sync/async version of each?
Kenneth Myhra (Apr 13 2021 at 19:32):
What is meant by switchable? A common interface shared between several implementations of the serializers?
If I am interpreting your post and code correctly a marker interface would not help me much in that direction
I think the people using your interface/implementation should decide if they want it to be scoped as a singleton or not.
If you want to support both sync/async I think the recommended approach is to have truly sync methods alongside the async methods, not sync methods calling your async methods. So you would have to implement them both separately. But only the part that touches IO needs to be async, so possibly most of your implementation would not need to be async?
Gino Canessa (Apr 13 2021 at 19:51):
Hi Brian, this had fallen a bit back on my list so I'm in the process of loading it all back into memory =)
Kenneth, what we are discussing is setting up an interface so that users can load different serializers and parsers without recompiling the core libraries. For example, if you have areas of the code where you know you have 'clean' data, you may want something with high performance and no validation. On the other hand, something parsing incoming data will likely need to have a lot of validation, but can tolerate the performance penalty of it. Additionally, developers have libraries of choice (e.g., System.Text.Json vs. Newtonsoft.Json), and it would be great if the FHIR libraries could use the same ones without adding additional dependencies.
As to sync/async - I think it's probably a good idea, but would want to see how much complexity it adds before committing to a side. Assuming the interfaces include IO and/or validation, those operations can take some time.
Brian Postlethwaite (Apr 13 2021 at 20:04):
Yup, what Gino said. And not sure on async/sync - but know it's an issue to discuss.
Brian Postlethwaite (Apr 13 2021 at 20:11):
I've done a separate Async Xml parser too - to see how that works and how they perform. Turns out the MS async XmlReader is slightly slower, and has at least 1 bug in it (non Async) that I've worked around.
Gino Canessa (Apr 13 2021 at 20:14):
Interesting. Also, I'm glad you're working on XML since I haven't at all =)
Brian Postlethwaite (Apr 13 2021 at 20:29):
And also why I'm glad you've been doing the Json one :wink:
I have whitespace issues with Narratives and Markdown to resolve and a final quality check before I'm happy it's done.
(It's using the raw XmlReader class - no Document based processing)
Gino Canessa (Apr 13 2021 at 20:36):
Amazing how that works out :smile:
I believe you may be ahead of me in getting it completed, but I ended up doing the same on my side (e.g., using the lowest level JSON processor available). That said, I'm jealous of your 'fixed order of elements in XML', since I need to handle cases like not knowing the resource type until the last element of an object. :-/
Brian Postlethwaite (Apr 13 2021 at 20:44):
Yes, I do need to review it one last time, as it's not fussy on ordering right now. (big switch statements)
And considering how to redo that part.
Kenneth Myhra (Apr 13 2021 at 20:55):
Interesting approach, I like it
Additionally, developers have libraries of choice (e.g., System.Text.Json vs. Newtonsoft.Json), and it would be great if the FHIR libraries could use the same ones without adding additional dependencies.
Would like to learn more about the approach to solving this part
Brian Postlethwaite (Apr 13 2021 at 21:09):
Would even like to consider a way like is done with the HttpClientFactory where you can have "named" serializer uses - so in libs you can switch in/out various ones as the user desires (not an all or nothing situation).
Gino Canessa (Apr 13 2021 at 21:41):
Unfortunately, I think the goals are still a bit ahead of the solutions =).
Given that it looks as though there could be multiple options for each supported format, I would guess that the best experience would be something like a default implementation (hopefully with few dependencies, but that's only one metric that needs to be considered) and then additional packages available for other use cases.
That said, I've been focused on a different area (seeing what's worthwhile / possible on the JSON side), so I haven't given much thought to what packaging would look like. I assume there are people who know more about the packaging process and have thoughts on it.
Brian Postlethwaite (Apr 14 2021 at 00:41):
And for now, that default implementation is the existing well worn serializers.
Kenneth Myhra (Apr 14 2021 at 08:42):
Read through your code again, Brian. And I see that you indeed have the interface IFhirSerialization with the necessary defined methods. So disregard my comment about marker interfaces
Brian Postlethwaite (Apr 14 2021 at 21:55):
<OperationOutcome xmlns="http://hl7.org/fhir">
<issue>
<severity value="error" />
<code value="structure" />
<details>
<text value="Unexpected element found chicken" />
</details>
<location value="Patient.contained[0].chicken" />
<location value="xml position: 12,8" />
</issue>
<issue>
<severity value="error" />
<code value="structure" />
<details>
<text value="Unexpected element found chicken" />
</details>
<diagnostics value="<chicken value="brrrk" xmlns="http://hl7.org/fhir"><turkey xmlns="http://somethingelse">slime</turkey></chicken>" />
<location value="Patient.extension[0].value.chicken" />
<location value="xml position: 25,8 to 27,9" />
</issue>
<issue>
<severity value="error" />
<code value="value" />
<details>
<text value="Invalid decimal value 'chicken'" />
</details>
<diagnostics value="Content cannot be converted to the type Decimal. Line 34, position 19." />
<location value="Patient.extension[2].value" />
<location value="xml position: 34,19" />
</issue>
<issue>
<severity value="error" />
<code value="structure" />
<details>
<text value="Unexpected element found chicken" />
</details>
<location value="Patient.telecom[1].chicken" />
<location value="xml position: 59,6" />
</issue>
<issue>
<severity value="error" />
<code value="value" />
<details>
<text value="Invalid positiveint value '5.0'" />
</details>
<diagnostics value="Content cannot be converted to the type Int. Line 61, position 11." />
<location value="Patient.telecom[1].rank" />
<location value="xml position: 61,11" />
</issue>
<issue>
<severity value="error" />
<code value="structure" />
<details>
<text value="Unexpected element found chicken" />
</details>
<location value="Patient.chicken" />
<location value="xml position: 64,4" />
</issue>
</OperationOutcome>
Kenneth Myhra (Apr 15 2021 at 08:19):
Nice!
Johannes Höhn (Apr 15 2021 at 09:37):
Isn't location deprecated?
Brian Postlethwaite (Apr 15 2021 at 22:17):
Yes, Ewout pointed that out yesterday...
Gino Canessa (May 25 2021 at 17:29):
Just to resurrect an old thread =).
Finally got time to come back to this - I still have an open question about proper serialization (in #conformance - about arrays and nulls), but otherwise just have some cleaning to do.
The benchmark below is for serializing and parsing two examples (Observation-2minute-apgar-score.json and Patient-example.json from the R4 example package), after they have been read and loaded into memory.
Tags for the people we've been discussing this with: @Brian Postlethwaite @Ewout Kramer @Marco Visser
Cheers!
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
Intel Xeon W-2255 CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET Core SDK=5.0.203
[Host] : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT
DefaultJob : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT
Method | Categories | Filename | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
FirelyParse | Parse | Obser(...).json [36] | 533.16 μs | 10.336 μs | 10.152 μs | 1.00 | 51.7578 | 13.6719 | - | 513.62 KB |
FirelyExtParse | Parse | Obser(...).json [36] | 37.96 μs | 0.698 μs | 0.618 μs | 0.07 | 2.8687 | 0.3052 | - | 28.4 KB |
FirelySerialize | Serialize | Obser(...).json [36] | 309.68 μs | 5.339 μs | 4.994 μs | 1.00 | 34.1797 | 6.8359 | - | 338.05 KB |
FirelyExtSerialize | Serialize | Obser(...).json [36] | 28.03 μs | 0.495 μs | 0.463 μs | 0.09 | 3.5706 | 0.3357 | - | 35.23 KB |
FirelyParse | Parse | Patient-example.json | 384.19 μs | 6.220 μs | 5.818 μs | 1.00 | 35.6445 | 7.3242 | - | 353 KB |
FirelyExtParse | Parse | Patient-example.json | 27.69 μs | 0.547 μs | 0.586 μs | 0.07 | 2.2278 | 0.2441 | - | 21.98 KB |
FirelySerialize | Serialize | Patient-example.json | 226.42 μs | 2.187 μs | 1.826 μs | 1.00 | 22.4609 | 3.4180 | - | 221.55 KB |
FirelyExtSerialize | Serialize | Patient-example.json | 16.25 μs | 0.259 μs | 0.243 μs | 0.07 | 1.2512 | 0.0305 | - | 12.55 KB |
Brian Postlethwaite (May 25 2021 at 17:36):
Keen to grab your test so I can compare apples to apples.
Brian Postlethwaite (May 25 2021 at 17:37):
What's the Error column?
Brian Postlethwaite (May 25 2021 at 17:40):
Wondering how it would go with the json version of this...
https://github.com/brianpos/fhir-net-web-api/blob/feature/r4-xml-performance/src/Test.WebApi.AspNetCore/TestPatientWithErrors.xml
I'll re-try mine with the xml versions of those 2 files.
Gino Canessa (May 25 2021 at 17:52):
It's the statistical analysis of the benchmark timings - I'll post a link to to the code once I commit this last change, I believe that will leave everything in the examples properly round-tripping.
Gino Canessa (May 25 2021 at 17:54):
The two I'm happiest about right now are the Ratio (new version is 90%+ faster) and allocated memory (10-15x less memory activity).
Gino Canessa (May 27 2021 at 16:28):
Ok, so I've posted everything and I'm happy with it, except for one remaining issue. The new serializer escapes a few Unicode characters that the Newtonsoft one doesn't. The most common case is replacing a no-break-space with \u00A0 (e.g., in narrative div content).
Theoritically those are the same - so I'm not sure it's worth the performance hit of undoing it or not. Any thoughts would be appreciated.
So, the output is a set of extensions that work with the stock Hl7.Fhir.R4 package. The serializer and parser are interoperable with the stock ones, e.g., you can parse with one and serialize with the other.
I still have some additions I want to make to the exception handling (if you are curious you can compare areas that throw JsonException without params vs. the ones that make a nice message for what I'm thinking). I want to take a look at Brian's XML ones at some point to see if I could generate a matching OperationOutcome like his code does, but I have so far been focused on performance and correctness.
So, the preview code is in the dev branch. Of interest:
- src/Microsoft.Health.Fhir.SpecManager/Language/CSharpFirely2JsonExt.cs
- Contains the language processor to export the extensions.
- I have some cleanup yet to do, but it isn't terrible =)
- generated/SystemTextJsonExt_R4
- Contains the exported R4 files.
- These are extensions on the standard Hl7.Fhir.R4 package.
- test/perfTestCS
- Benchmarking project
- Everything interesting is in Benchmark/SerializationBenchmarks.cs
- Can see snippets to Serialize and Parse in FirelyExtSerialize() and FirelyExtParse()
- test/JsonRoundtrip
- Note this is rough because I wasn't even generally planning on checking it in =)
- Contains the working code I use to check round-trips of the parser/serializer.
- Really, it's painful to read.
Cheers!
Brian Postlethwaite (May 28 2021 at 02:41):
Awesome, I have the same issues with xml and whitespace/newlines.
I'll see if I can get a look into it soon.
Brian Postlethwaite (Jun 09 2021 at 18:09):
This is one of the functions that does the OperationOutcome for parsing issues
Brian Postlethwaite (Jun 25 2021 at 06:17):
@Gino Canessa this is the workaround for the bug in the Microsoft XML Async Parser core code...
Don't know if that's something you can pass onto relevant folks for a future version...
https://github.com/brianpos/fhir-net-web-api/blob/e324bb3a6f9a2cc98b63be6c187deb4d868e097d/src/Hl7.Fhir.Custom.Serializers/FhirCustomXmlReader.core.cs#L318
Last updated: Apr 12 2022 at 19:14 UTC