FHIR Chat · CS only packages/IGs · IG creation

Stream: IG creation

Topic: CS only packages/IGs


view this post on Zulip Patrick Werner (Sep 15 2021 at 12:52):

Grahame adviced to move the topic from implementers to this channel:
I am currently trying to create fhir packages for (big) CodeSystems like ICD10, NCIT, etc..
These are converted directly from CLAML -> FHIR, some are authored in FSH. While sushi is quite fast at creating the CS json files, the IG publisher struggles and takes almost 3h on a 8core 16gig gitlab runner.
Most of the time is consumed by validation and narrative generation.

My preferred solution would be to be able to disable narrative generation and validation for the IG creation in the ig publisher, as far as is saw this is not possible at the moment.
2nd Option would be to just use sushi and package manually in a CI step. Which doesn't produce a IG html page where guidance for these CodeSystems should go.
Is there an easy solution which i am missing?

view this post on Zulip Patrick Werner (Sep 15 2021 at 12:53):

Current idea is to first invoke sushi and the use the hapi cli to create the package

view this post on Zulip Patrick Werner (Sep 15 2021 at 12:57):

But i could also see IG Publisher supporting this use-case. Could be done through enabling deactivation of the 2 time-consuming steps (narrative generation and validation) in the ig.ini

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 13:21):

I think an IG parameter for disabling validation would make sense here

view this post on Zulip Patrick Werner (Sep 15 2021 at 13:43):

this would go into ig.ini, right?

view this post on Zulip Lloyd McKenzie (Sep 15 2021 at 13:56):

I don't know about a parameter. This isn't something that should always be true for the IG. When it comes time to publish, you'd need full validation and narrative generation. You'd just like to be able to iterate more quickly some of the time. So it seems to me that a launch parameter would make more sense?

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:07):

I meantit as a repeatable parameter with the canonical urls.

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:10):

something like "skipValidationUrl"

view this post on Zulip Patrick Werner (Sep 15 2021 at 14:10):

@Lloyd McKenzie We have one package containing ICD, NCIT, ATC ...... Validates approx 3hours.
This is something we are doing manually from time to time. These FHIR Ressources are FSH generated, so sushi is our CI testing tool.

view this post on Zulip Patrick Werner (Sep 15 2021 at 14:10):

So for our use-case ig.ini parameters:
skipNarrativeGeneration
skipValidation

would make sense. As this a CS only package we don't need to have this bound to specific canonicals as @Jose Costa Teixeira proposed.
But i also can see his implied use-case.

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:11):

Here I assume that within one IG, there would be resources that should be validated and others that shouldn't.

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:12):

There should be other cases for this skipValidation: For example if a testScript uses a non-conformant resource on purpose, we could skip validation for that one

view this post on Zulip Patrick Werner (Sep 15 2021 at 14:13):

So having both options would be nice then

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:16):

Alternatively / additionally, we could have a "validate all or skip all" parameter - there it would be
a) a command line, if sometimes we want to validate and sometimes we want not to validate
b) a fixed parameter (in the IG? I'd prefer not to mess with the ig.ini)

view this post on Zulip Patrick Werner (Sep 15 2021 at 14:41):

ad b) could be an extension(s) on the IG resource itself/ sushi yaml setting

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:46):

do you need option b, Patrick?

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:46):

or a)

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 14:47):

i.e. when you skip validation, do you always skip validation for the entire IG in every build?

view this post on Zulip Lloyd McKenzie (Sep 15 2021 at 16:17):

The key thing for me is that these processes can't be skipped for content that we actually publish (and perhaps even not for content posted to the CI-build - though an extra warning in the CI-build might be enough)

view this post on Zulip Jose Costa Teixeira (Sep 15 2021 at 17:30):

can't we put "invalid" content in the CI build? e.g test resources?

view this post on Zulip Grahame Grieve (Sep 15 2021 at 17:46):

is this a public package? Why isn't this stuff going into terminology.hl7.org?

view this post on Zulip Elliot Silver (Sep 15 2021 at 18:44):

@Patrick Werner This is exactly the issue I was facing a couple of weeks ago. See https://chat.fhir.org/#narrow/stream/179252-IG-creation/topic/Multipart.20IGs. I'm not sure if the issue is to disable validation, or to look at whether the publisher can be restructured to not require everything be in memory at once.

view this post on Zulip Grahame Grieve (Sep 15 2021 at 19:41):

I think that @Patrick Werner has bigger problems than running out of memory. I certainly never imagined that someone would try to publish those big code systems through a CodeSystem resource, let alone through the IG machinery

view this post on Zulip Rob Hausam (Sep 15 2021 at 22:20):

Agree with Grahame - this doesn't seem like the right (or at least the best) approach to take for this. Unless I'm missing something, I think this is something that should be supported directly by the terminology server. ICD-10 and ICD-10-GM are already supported on tx.fhir.org, and ICD-10-GM can be as well. That is done via a CodeSystem resource instance in the packages repo that includes the full code system contents (so it 's pretty big), but I think it avoids most of the overhead that is being experienced with trying to publish it through the IG mechanism. We also can and want to include the complete contents for ATC - but we haven't yet because I haven't found a full source for it other than a flat file, and I had expected and hoped to include the hierarchy. So if @Patrick Werner would be able to provide a source(?), that would be great. And, as far as I know, we can include support for NCIT, as well.

view this post on Zulip Michael Lawley (Sep 16 2021 at 02:06):

I'm interested in how big these code systems are. We have one for the UK's DM+D which is, in JSON form, > 300MB (without narrative)

view this post on Zulip Grahame Grieve (Sep 16 2021 at 03:05):

Almost always, these code systems have their own distribution formats. CodeSystem was never intended to scale up to compete with those

view this post on Zulip Patrick Werner (Sep 16 2021 at 07:55):

Yes these CS have their own format most of the time. We are loading huge CS in hapi, which works fine.
I agree this is an edge case, and validation in the CI autobuild should be enabled.

view this post on Zulip Patrick Werner (Sep 16 2021 at 07:55):

I think the easiest would be to have an optional command.-line parameter on the genonce.sh/publisher.jar

view this post on Zulip Patrick Werner (Sep 16 2021 at 07:56):

Grahame Grieve said:

is this a public package? Why isn't this stuff going into terminology.hl7.org?

private only. Used to distribute CodeSystems in a project to the partners.

view this post on Zulip Patrick Werner (Sep 16 2021 at 07:58):

Rob Hausam said:

Agree with Grahame - this doesn't seem like the right (or at least the best) approach to take for this. Unless I'm missing something,

My use-case is: create a package of Terminology Resources to distribute the content to servers. These servers are using these resources for validation.
So adding more CS to tx.fhir.org ist always nice - wouldn't solve the issue.

view this post on Zulip Patrick Werner (Sep 16 2021 at 08:07):

Rob Hausam said:

So if Patrick Werner would be able to provide a source(?), that would be great. And, as far as I know, we can include support for NCIT, as well.

We created the CS from the WHO Excel Sheet (with german display values), available here: https://www.wido.de/publikationen-produkte/arzneimittel-klassifikation/

view this post on Zulip Patrick Werner (Sep 16 2021 at 08:09):

NCIT created from: https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/

view this post on Zulip Rob Hausam (Sep 16 2021 at 13:52):

@Patrick Werner Do you know (or would you be able to find out) what the source is of the ATC content that WIdO uses for their adaptation of it for Germany? I've received from an EU source a copy of the ATC content as a flat list, but I need a way to continue to obtain updated versions as additions and revisions are made by the WHO Collaborating Centre. This seems to be the elusive item. And then I need to write (which should be too hard) or steal a script to extract the embedded hierarchy from the codes and transform the data to the FHIR CodeSystem resource.

view this post on Zulip Lloyd McKenzie (Sep 16 2021 at 16:58):

I'd prefer a flag that says that "don't validate X, Y and Z" rather than "don't validate at all". Or perhaps even "Use Tx server to validate X".

You can avoid narrative generation simply by defining your own narrative.

view this post on Zulip John Moehrke (Sep 16 2021 at 18:57):

I too would like to tag artifacts as "don't validate".. so that I can put in offending resources so as to speak to them in a test plan.

view this post on Zulip Patrick Werner (Sep 20 2021 at 12:43):

Rob Hausam said:

Patrick Werner Do you know (or would you be able to find out) what the source is of the ATC content that WIdO uses for their adaptation of it for Germany? I've received from an EU source a copy of the ATC content as a flat list, but I need a way to continue to obtain updated versions as additions and revisions are made by the WHO Collaborating Centre. This seems to be the elusive item. And then I need to write (which should be too hard) or steal a script to extract the embedded hierarchy from the codes and transform the data to the FHIR CodeSystem resource.

@Rob Hausam Will investigate and update you. What do you mean by source?
The technical source they are using? Or what is the base for the german publication?

view this post on Zulip Rob Hausam (Sep 20 2021 at 23:03):

@Patrick Werner I think I meant the technical source - if I understand what you mean by that. How do they get the latest content releases from WHO CC, that they will then incorporate into the German adaptation?

view this post on Zulip Patrick Werner (Sep 27 2021 at 09:04):

@Rob Hausam ok i just got the opportunity to ask the involved persons:
WHO Oslo distributes ATC in EXCEL and PDF

view this post on Zulip Rob Hausam (Sep 27 2021 at 12:56):

Thanks, @Patrick Werner. Neither of those formats would be my first choice, of course, but we can definitely work with Excel (probably not PDF). I (or someone) will need to write a transform from the Excel XML to the CodeSystem resource (either JSON or XML). One aspect is that it is necessary (or at least very desirable) to parse the codes to extract the embedded hierarchy and properly represent that in the CodeSystem resource. Doing that won't be particularly difficult, but does require a bit of extra effort.

view this post on Zulip Patrick Werner (Sep 27 2021 at 12:56):

I agree.

view this post on Zulip Grahame Grieve (Sep 28 2021 at 03:03):

parsing XML and inferring heirarchy sounds like an hour or two work for me. I'd do it

view this post on Zulip Rob Hausam (Oct 15 2021 at 11:16):

It took a while to get to it, but the work for parsing the file is now done, and the full version of ATC (from May 2020, which is the source that I had) is now available on tx.fhir.org. As soon as we get an updated Excel source file the update on the server will be much easier and quicker.
@Patrick Werner

view this post on Zulip Patrick Werner (Oct 16 2021 at 15:11):

Thanks @Rob Hausam


Last updated: Apr 12 2022 at 19:14 UTC