FHIR Chat · Combining national extensions to International SNOMED releas · snomed

Stream: snomed

Topic: Combining national extensions to International SNOMED releas


view this post on Zulip Jose Costa Teixeira (Dec 05 2019 at 21:21):

Hi. I'm trying to use the Belgian extension for SNOMED, but this comes as a 10MB file that doesn't seem to include the international bit, only the belgian stuff. It says in the download page:
This is the PRODUCTION Release of the September 2019 SNOMED CT Belgium Extension.
This package is dependent upon the SNOMED CT July 2019 International Edition, and so should be consumed and analysed accordingly.

view this post on Zulip Jose Costa Teixeira (Dec 05 2019 at 21:22):

Can anyone advise how to compile this onto a full (International + Belgium) set?

view this post on Zulip Grahame Grieve (Dec 05 2019 at 21:24):

that is, in Rf2 format

view this post on Zulip Jose Costa Teixeira (Dec 05 2019 at 21:25):

Yes, I forgot to mention the RF2 format

view this post on Zulip Jim Steel (Dec 06 2019 at 00:46):

Depends on the tool you're using to import it, but certainly for our tool, we usually just concatenate the zip files (add all the contents of the relevant international release - in this case the july 2019 - into the extension zip)

view this post on Zulip Grahame Grieve (Dec 06 2019 at 04:55):

so... he's using my importer which only works with the combined RF2 data files. Simply combining them resulted in duplicate relationships?

view this post on Zulip Michael Lawley (Dec 06 2019 at 05:49):

We combine the files using the information in the Module Dependency Reference Set.
It is the only valid way to create a SNAPSHOT release (because an extension may invalidate some relationships in the International SNAPSHOT)

view this post on Zulip Michael Lawley (Dec 06 2019 at 05:51):

That is, if the Belgian extension does more than just add Reference sets and new "leaf" concepts.

view this post on Zulip Jose Costa Teixeira (Dec 06 2019 at 05:57):

Is this Module Dependency Reference Set (yes, I googled it) something I can use to combine my BE Extension with the corresponding Intl release? are there instructions or tools for that?

view this post on Zulip Grahame Grieve (Dec 06 2019 at 06:01):

The BE release is the first I've found that doesn't release the snapshot

view this post on Zulip Jose Costa Teixeira (Dec 06 2019 at 06:09):

OK so the best is to ask the BE people to get the snapshot... I will ask

view this post on Zulip Jose Costa Teixeira (Dec 06 2019 at 06:10):

@Grahame Grieve will it hurt (i.e. will it cause suffering) to simply bypass duplicates in the code? Just an early thought - I mean, if these are identified as duplicate, why throw an exception and not simply ignore it?

view this post on Zulip Michael Lawley (Dec 06 2019 at 06:10):

If it's a "conformant" release then it (the MDRS) should be included. But you still need a loader that understands what to do when combining the files; it't not just a matter of concatenating the file contents.

view this post on Zulip Grahame Grieve (Dec 06 2019 at 06:15):

what to do when combining the files

that's why I was using the snapshot- ducking getting involved with this

view this post on Zulip Michael Lawley (Dec 06 2019 at 06:24):

The basic model is is determine the moduleId for the release you want - call this BE, and the release data, call this RD
Look in the MDRS for all active rows in module BE and effectiveTime RD. This will give you a list of target moduleIds and associated target effectiveTimes - call these Mi and Ti
Then, for every input file exclude all rows where the module = one of the Mi and the effectiveTime is greater than the corresponding Ti
Finally, for every row, with the 1st column as the "key" select each row with the latest effectiveTime.

For SNAPSHOT releases + simple extensions you're only going to exclude a few rows. For non-simple extensions they should never be released in this partial form because they need to be run through a DL classifier to compute appropriate inferred relationships.

view this post on Zulip Rory Davidson (Dec 06 2019 at 15:24):

To jump in here, the BE release is a valid RF2 extension release with FULL, DELTA and SNAPSHOT containing mostly translated descriptions. It can be downloaded from the Belgian NRC here http://mlds.ihtsdotools.org/be . With regards to using, as Michael points out, it depends on the tools you have. We loaded it into our terminology server used by the SNOMED CT Browser as an extension which is a branch off of the international edition.

view this post on Zulip Grahame Grieve (Dec 06 2019 at 18:13):

hm. I applied for permission to access BE snomed so I can look

view this post on Zulip Jose Costa Teixeira (Dec 06 2019 at 19:53):

It is not improbable that I did something wrong. But I only saw a 10 MB file, and the International Edition was separate.

view this post on Zulip Michael Lawley (Dec 06 2019 at 23:07):

I've also applied for permission to check :-)

view this post on Zulip Grahame Grieve (Dec 06 2019 at 23:22):

I feel like I needed a note from my parents

view this post on Zulip David Hay (Dec 08 2019 at 06:54):

have you been good?

view this post on Zulip Grahame Grieve (Dec 08 2019 at 07:50):

hmm let's not actually ask my parents

view this post on Zulip David Hay (Dec 08 2019 at 08:05):

best not...

view this post on Zulip Jose Costa Teixeira (Jan 03 2020 at 10:20):

I just confirmed that in BE we do not have a single release file.

view this post on Zulip Jose Costa Teixeira (Jan 03 2020 at 10:21):

Does anyone know of practices/tools to do that?

view this post on Zulip Adam Flinton (Jan 03 2020 at 11:18):

Ask the snomed people. I am assuming you terminology center is working with them. I would give you names/emails but it's probably best to go through the BE-TC.

view this post on Zulip Adam Flinton (Jan 03 2020 at 11:19):

& in answer to your question yes there are tools etc.

view this post on Zulip Adam Flinton (Jan 03 2020 at 11:21):

By single release file I am assuming a zip containing both the be extension and the relevant intl release it is based off.

view this post on Zulip Jose Costa Teixeira (Jan 03 2020 at 11:25):

OK I will get the BE terminology center to check with SNOMED

view this post on Zulip Grahame Grieve (Jan 04 2020 at 09:27):

@Rory Davidson and @Michael Lawley and I have been discussing it. In general, there is no plan to produce such a release. It's a tooling problem. What I haven't figured out is why the BE snapshot duplicates some things from the international release, and whether it's redefining them or just repeating them. It's on my todo list

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 09:30):

I mentioned to the SCT BE team that we need a single file release. This is when they asked if there are tools because otherwise they must create a DB and run the process (I'm guessing deduplicating?)

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 09:31):

Is there anything we can do to help? Perhaps I can send you and the BE people and email so that my ignorance is not a bottleneck

view this post on Zulip Grahame Grieve (Jan 04 2020 at 09:41):

well, Snomed Intl have a lambda that will do it but it's not reusable outside a particular context. Really, it's our code that's supposed to manage this. Only I haven't had time to figure it out. You could look - for all the concepts, descriptions, and relationships duplicated, are their definitions the same? or different?

view this post on Zulip Grahame Grieve (Jan 04 2020 at 09:41):

there's about a 1000 duplicates

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 09:49):

I'll work on getting the duplicates and analysing this

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 09:52):

I can for example look at where the servertools imports and detects a duplicate and log those for analysis.

view this post on Zulip Grahame Grieve (Jan 04 2020 at 09:54):

y. you need to change to not blow up and instead log

view this post on Zulip Rory Davidson (Jan 04 2020 at 11:17):

Normally, there shouldn't be any duplicates in extensions, especially in the BE extension which is primarily only translated descriptions. Do you have any examples and I can dig further as well?

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:05):

I ran the import tool logging the exceptions, and

  1. In the first phase, there were lots of times the same set of issues:
Unable to find caps type 900000000000003001
Unable to find caps type 900000000000013009
Unable to find caps type 900000000000550004
Unable to find desc module 11000172109
Unable to find desc module 900000000000012004
Unable to find desc module 900000000000207008
Unable to find desc type 900000000000003001
Unable to find desc type 900000000000013009
  1. After this, I still get a "Duplicates not allowed" but I do not know where this comes from, I cannot find that message in the code.

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:05):

It's time for me to RTFM and see what needs to be done to these files

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:23):

What I do find as "duplicates" is that the same concepts are being defined in different languages

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:24):

Some examples, not knowing if insightful:
from sct2_Description_Snapshot-en_INT_20190731:

id  effectiveTime   active  moduleId    conceptId   languageCode    typeId  term    caseSignificanceId
...
215929010   20170731    1   900000000000207008  134187008   en  900000000000013009  Child protection procedure  900000000000448009
...

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:24):

from sct2_Description_Snapshot-nl_BE1000172_20190915:

81000172116 20180315    1   11000172109 11000172109 nl  900000000000013009  Belgische module    900000000000017005
341000172119    20180315    1   11000172109 134187008   nl  900000000000013009  procedure voor kinderbescherming    900000000000448009
...

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:24):

from sct2_Description_Snapshot-fr_BE1000172_20190915:

71000172119 20180315    1   11000172109 11000172109 fr  900000000000013009  Module Belge    900000000000017005
331000172114    20180315    1   11000172109 134187008   fr  900000000000013009  procédure de protection de l'enfant 900000000000448009
...

view this post on Zulip Jose Costa Teixeira (Jan 04 2020 at 23:24):

the entire sct2_Description_Snapshot-en_BE1000172_20190915 is:

11000172113 20180315    1   11000172109 11000172109 en  900000000000003001  Belgian module (core metadata concept)  900000000000017005
21000172115 20180315    1   11000172109 11000172109 en  900000000000013009  Belgian module  900000000000017005
31000172117 20180315    1   11000172109 21000172104 en  900000000000003001  Belgian French language reference set (foundation metadata concept) 900000000000017005
41000172112 20180315    1   11000172109 21000172104 en  900000000000013009  Belgian French language reference set   900000000000017005
51000172114 20180315    1   11000172109 31000172101 en  900000000000003001  Belgian Dutch language reference set (foundation metadata concept)  900000000000017005
61000172111 20180315    1   11000172109 31000172101 en  900000000000013009  Belgian Dutch language reference set    900000000000017005
80481000172113  20180315    1   11000172109 71000172103 en  900000000000003001  Dichorionic triamniotic triplet pregnancy (disorder)    900000000000448009
80491000172111  20180315    1   11000172109 71000172103 en  900000000000013009  Dichorionic triamniotic triplet pregnancy   900000000000448009
80571000172114  20180315    1   11000172109 111000172100    en  900000000000003001  Monochorionic triamniotic triplet pregnancy (disorder)  900000000000448009
80581000172112  20180315    1   11000172109 111000172100    en  900000000000013009  Monochorionic triamniotic triplet pregnancy 90000000000044800

view this post on Zulip Rory Davidson (Jan 05 2020 at 06:43):

With regards to the duplicates, the BE descriptions are expected. As you say, these are new descriptions in different languages for the same concept which is an aspect of the SNOMED CT model enabling the unique concept id to be used irrelevant of which language is requested..

view this post on Zulip Grahame Grieve (Jan 05 2020 at 12:35):

what are you loading from? snapshot?

view this post on Zulip Jose Costa Teixeira (Jan 05 2020 at 12:45):

yes

view this post on Zulip Michael Lawley (Jan 06 2020 at 05:34):

So, these are not "duplicates". A SNOMED concept may have many descriptions (akin to a FHIR CodeSystem concept having many designations).
What is the import tool that you're using?

view this post on Zulip Jose Costa Teixeira (Jan 06 2020 at 05:45):

Grahame's server utils

view this post on Zulip Rob Hausam (Jan 08 2020 at 03:24):

any further progress on this?

view this post on Zulip Jose Costa Teixeira (Jan 08 2020 at 10:11):

Hi Rob. We're working on this. We're looking for some materials to demonstrate a terminology server for first timers.

view this post on Zulip Rory Davidson (Jan 08 2020 at 12:27):

If this helps, we have an open-source terminology server for SNOMED CT which supports the FHIR Terminology Services and manages both editions and extensions and is pretty quick to set up for demonstrations - https://github.com/IHTSDO/snowstorm

view this post on Zulip Jose Costa Teixeira (Jan 08 2020 at 12:56):

This is interesting, thanks. Perhaps what we have in Belgium is an extension and can be handled as such...? We started some discussion to investigate.
This server seems helpful

view this post on Zulip Rory Davidson (Jan 08 2020 at 13:45):

Yes, it is an extension in Belgium and should be handled as such. We use that server which hosts the online SNOMED CT browser for Belgium as well so it should work.

view this post on Zulip Jose Costa Teixeira (Jan 08 2020 at 14:06):

ok we will try.

view this post on Zulip Jose Costa Teixeira (Jan 08 2020 at 14:07):

Also the snapshot edition of the BE edition (or tools to generate itt) would be very useful.

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:30):

the issue I have is duplicate relationships, not additional descriptions. here's some examples:

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:45):

in the snapshot for the current BE release:

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:45):

107532022   20180915    0   11000172109 116577007   116576003   0   116680003   900000000000011006  900000000000451002

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:46):

In the snapshot for international, v 20190731

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:47):

107532022   20190131    0   900000000000207008  116577007   116576003   0   116680003   900000000000011006  900000000000451002

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:47):

I find 2753 examples of this

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:49):

All the entries I have checked just have a different module Id. But that's only a small sampling...

view this post on Zulip Jose Costa Teixeira (Jan 09 2020 at 23:54):

I could not find the place in the import tool where duplicates are intercepted to log them. I just got a message "Duplicates not allowed"

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:54):

I haven't committed that yet.

view this post on Zulip Jose Costa Teixeira (Jan 09 2020 at 23:54):

ah ok

view this post on Zulip Jose Costa Teixeira (Jan 09 2020 at 23:55):

in a few hours we will have a phone call with Rory and the BE crew, I think this can wait until then

view this post on Zulip Grahame Grieve (Jan 09 2020 at 23:56):

my import fails later with some error about refsets and field types. I'm expecting that this is my issue but I haven't had time to investigate yet

view this post on Zulip Rob Hausam (Jan 10 2020 at 03:28):

Are you getting this from the Full release (not the Snapshot)?

view this post on Zulip Grahame Grieve (Jan 10 2020 at 04:16):

full release of intl and snapshot of BE

view this post on Zulip Rob Hausam (Jan 10 2020 at 05:42):

Makes sense. Are you following this algorithm for selecting the component rows for a particular snapshot view? In the full release, for the rows with a particular component (concept, description, relationship, etc.) id, only the one with the most recent effectiveTime that is equal to or less than the specified snapshot time (usually, but not always, the current time) is selected for the snapshot view. With an extension, as in the examples that you gave, if the extension version is based on the international release version that you are using (which it should be - presumably the current BE release is based on the international 20190731 release?), then for the current snapshot view you would still select only the row with the greatest effectiveTime, which in this case would be the row for 107532022 from the international release (with effectiveTime 20190131). The row from the BE release (with effectiveTime 20180915) would be excluded. That may resolve the duplicate? @Rory Davidson can correct me if I said anything wrong!

view this post on Zulip Rory Davidson (Jan 10 2020 at 10:07):

Yes, @Rob Hausam, that's correct. The most record with the most recent effective date trumps any previous records. This can span between modules as well. The snapshot only shows that most recent effective dated record, whereas the full contains them all.

view this post on Zulip Grahame Grieve (Jan 10 2020 at 10:49):

ok thanks. I had not seen that link (and still can't, courtesy of Australia's 3rd world internet...). but I'll resolve it that way

view this post on Zulip Grahame Grieve (Jan 10 2020 at 10:51):

ok thanks. I had not seen that link (and still can't, courtesy of Australia's 3rd world internet...). but I'll resolve it that way

view this post on Zulip Michael Lawley (Jan 29 2020 at 04:23):

But be careful when crossing modules - If the BE extension for 20180915 depends on the International Edition for 20180731, then that row above (Int module, but later date of 20190131) does NOT trump the BE row.


Last updated: Apr 12 2022 at 19:14 UTC