Stream: snomed
Topic: Combining national extensions to International SNOMED releas
Jose Costa Teixeira (Dec 05 2019 at 21:21):
Hi. I'm trying to use the Belgian extension for SNOMED, but this comes as a 10MB file that doesn't seem to include the international bit, only the belgian stuff. It says in the download page:
This is the PRODUCTION Release of the September 2019 SNOMED CT Belgium Extension.
This package is dependent upon the SNOMED CT July 2019 International Edition, and so should be consumed and analysed accordingly.
Jose Costa Teixeira (Dec 05 2019 at 21:22):
Can anyone advise how to compile this onto a full (International + Belgium) set?
Grahame Grieve (Dec 05 2019 at 21:24):
that is, in Rf2 format
Jose Costa Teixeira (Dec 05 2019 at 21:25):
Yes, I forgot to mention the RF2 format
Jim Steel (Dec 06 2019 at 00:46):
Depends on the tool you're using to import it, but certainly for our tool, we usually just concatenate the zip files (add all the contents of the relevant international release - in this case the july 2019 - into the extension zip)
Grahame Grieve (Dec 06 2019 at 04:55):
so... he's using my importer which only works with the combined RF2 data files. Simply combining them resulted in duplicate relationships?
Michael Lawley (Dec 06 2019 at 05:49):
We combine the files using the information in the Module Dependency Reference Set.
It is the only valid way to create a SNAPSHOT release (because an extension may invalidate some relationships in the International SNAPSHOT)
Michael Lawley (Dec 06 2019 at 05:51):
That is, if the Belgian extension does more than just add Reference sets and new "leaf" concepts.
Jose Costa Teixeira (Dec 06 2019 at 05:57):
Is this Module Dependency Reference Set (yes, I googled it) something I can use to combine my BE Extension with the corresponding Intl release? are there instructions or tools for that?
Grahame Grieve (Dec 06 2019 at 06:01):
The BE release is the first I've found that doesn't release the snapshot
Jose Costa Teixeira (Dec 06 2019 at 06:09):
OK so the best is to ask the BE people to get the snapshot... I will ask
Jose Costa Teixeira (Dec 06 2019 at 06:10):
@Grahame Grieve will it hurt (i.e. will it cause suffering) to simply bypass duplicates in the code? Just an early thought - I mean, if these are identified as duplicate, why throw an exception and not simply ignore it?
Michael Lawley (Dec 06 2019 at 06:10):
If it's a "conformant" release then it (the MDRS) should be included. But you still need a loader that understands what to do when combining the files; it't not just a matter of concatenating the file contents.
Grahame Grieve (Dec 06 2019 at 06:15):
what to do when combining the files
that's why I was using the snapshot- ducking getting involved with this
Michael Lawley (Dec 06 2019 at 06:24):
The basic model is is determine the moduleId for the release you want - call this BE, and the release data, call this RD
Look in the MDRS for all active rows in module BE and effectiveTime RD. This will give you a list of target moduleIds and associated target effectiveTimes - call these Mi and Ti
Then, for every input file exclude all rows where the module = one of the Mi and the effectiveTime is greater than the corresponding Ti
Finally, for every row, with the 1st column as the "key" select each row with the latest effectiveTime.
For SNAPSHOT releases + simple extensions you're only going to exclude a few rows. For non-simple extensions they should never be released in this partial form because they need to be run through a DL classifier to compute appropriate inferred relationships.
Rory Davidson (Dec 06 2019 at 15:24):
To jump in here, the BE release is a valid RF2 extension release with FULL, DELTA and SNAPSHOT containing mostly translated descriptions. It can be downloaded from the Belgian NRC here http://mlds.ihtsdotools.org/be . With regards to using, as Michael points out, it depends on the tools you have. We loaded it into our terminology server used by the SNOMED CT Browser as an extension which is a branch off of the international edition.
Grahame Grieve (Dec 06 2019 at 18:13):
hm. I applied for permission to access BE snomed so I can look
Jose Costa Teixeira (Dec 06 2019 at 19:53):
It is not improbable that I did something wrong. But I only saw a 10 MB file, and the International Edition was separate.
Michael Lawley (Dec 06 2019 at 23:07):
I've also applied for permission to check :-)
Grahame Grieve (Dec 06 2019 at 23:22):
I feel like I needed a note from my parents
David Hay (Dec 08 2019 at 06:54):
have you been good?
Grahame Grieve (Dec 08 2019 at 07:50):
hmm let's not actually ask my parents
David Hay (Dec 08 2019 at 08:05):
best not...
Jose Costa Teixeira (Jan 03 2020 at 10:20):
I just confirmed that in BE we do not have a single release file.
Jose Costa Teixeira (Jan 03 2020 at 10:21):
Does anyone know of practices/tools to do that?
Adam Flinton (Jan 03 2020 at 11:18):
Ask the snomed people. I am assuming you terminology center is working with them. I would give you names/emails but it's probably best to go through the BE-TC.
Adam Flinton (Jan 03 2020 at 11:19):
& in answer to your question yes there are tools etc.
Adam Flinton (Jan 03 2020 at 11:21):
By single release file I am assuming a zip containing both the be extension and the relevant intl release it is based off.
Jose Costa Teixeira (Jan 03 2020 at 11:25):
OK I will get the BE terminology center to check with SNOMED
Grahame Grieve (Jan 04 2020 at 09:27):
@Rory Davidson and @Michael Lawley and I have been discussing it. In general, there is no plan to produce such a release. It's a tooling problem. What I haven't figured out is why the BE snapshot duplicates some things from the international release, and whether it's redefining them or just repeating them. It's on my todo list
Jose Costa Teixeira (Jan 04 2020 at 09:30):
I mentioned to the SCT BE team that we need a single file release. This is when they asked if there are tools because otherwise they must create a DB and run the process (I'm guessing deduplicating?)
Jose Costa Teixeira (Jan 04 2020 at 09:31):
Is there anything we can do to help? Perhaps I can send you and the BE people and email so that my ignorance is not a bottleneck
Grahame Grieve (Jan 04 2020 at 09:41):
well, Snomed Intl have a lambda that will do it but it's not reusable outside a particular context. Really, it's our code that's supposed to manage this. Only I haven't had time to figure it out. You could look - for all the concepts, descriptions, and relationships duplicated, are their definitions the same? or different?
Grahame Grieve (Jan 04 2020 at 09:41):
there's about a 1000 duplicates
Jose Costa Teixeira (Jan 04 2020 at 09:49):
I'll work on getting the duplicates and analysing this
Jose Costa Teixeira (Jan 04 2020 at 09:52):
I can for example look at where the servertools imports and detects a duplicate and log those for analysis.
Grahame Grieve (Jan 04 2020 at 09:54):
y. you need to change to not blow up and instead log
Rory Davidson (Jan 04 2020 at 11:17):
Normally, there shouldn't be any duplicates in extensions, especially in the BE extension which is primarily only translated descriptions. Do you have any examples and I can dig further as well?
Jose Costa Teixeira (Jan 04 2020 at 23:05):
I ran the import tool logging the exceptions, and
- In the first phase, there were lots of times the same set of issues:
Unable to find caps type 900000000000003001 Unable to find caps type 900000000000013009 Unable to find caps type 900000000000550004 Unable to find desc module 11000172109 Unable to find desc module 900000000000012004 Unable to find desc module 900000000000207008 Unable to find desc type 900000000000003001 Unable to find desc type 900000000000013009
- After this, I still get a "Duplicates not allowed" but I do not know where this comes from, I cannot find that message in the code.
Jose Costa Teixeira (Jan 04 2020 at 23:05):
It's time for me to RTFM and see what needs to be done to these files
Jose Costa Teixeira (Jan 04 2020 at 23:23):
What I do find as "duplicates" is that the same concepts are being defined in different languages
Jose Costa Teixeira (Jan 04 2020 at 23:24):
Some examples, not knowing if insightful:
from sct2_Description_Snapshot-en_INT_20190731:
id effectiveTime active moduleId conceptId languageCode typeId term caseSignificanceId ... 215929010 20170731 1 900000000000207008 134187008 en 900000000000013009 Child protection procedure 900000000000448009 ...
Jose Costa Teixeira (Jan 04 2020 at 23:24):
from sct2_Description_Snapshot-nl_BE1000172_20190915:
81000172116 20180315 1 11000172109 11000172109 nl 900000000000013009 Belgische module 900000000000017005 341000172119 20180315 1 11000172109 134187008 nl 900000000000013009 procedure voor kinderbescherming 900000000000448009 ...
Jose Costa Teixeira (Jan 04 2020 at 23:24):
from sct2_Description_Snapshot-fr_BE1000172_20190915:
71000172119 20180315 1 11000172109 11000172109 fr 900000000000013009 Module Belge 900000000000017005 331000172114 20180315 1 11000172109 134187008 fr 900000000000013009 procédure de protection de l'enfant 900000000000448009 ...
Jose Costa Teixeira (Jan 04 2020 at 23:24):
the entire sct2_Description_Snapshot-en_BE1000172_20190915 is:
11000172113 20180315 1 11000172109 11000172109 en 900000000000003001 Belgian module (core metadata concept) 900000000000017005 21000172115 20180315 1 11000172109 11000172109 en 900000000000013009 Belgian module 900000000000017005 31000172117 20180315 1 11000172109 21000172104 en 900000000000003001 Belgian French language reference set (foundation metadata concept) 900000000000017005 41000172112 20180315 1 11000172109 21000172104 en 900000000000013009 Belgian French language reference set 900000000000017005 51000172114 20180315 1 11000172109 31000172101 en 900000000000003001 Belgian Dutch language reference set (foundation metadata concept) 900000000000017005 61000172111 20180315 1 11000172109 31000172101 en 900000000000013009 Belgian Dutch language reference set 900000000000017005 80481000172113 20180315 1 11000172109 71000172103 en 900000000000003001 Dichorionic triamniotic triplet pregnancy (disorder) 900000000000448009 80491000172111 20180315 1 11000172109 71000172103 en 900000000000013009 Dichorionic triamniotic triplet pregnancy 900000000000448009 80571000172114 20180315 1 11000172109 111000172100 en 900000000000003001 Monochorionic triamniotic triplet pregnancy (disorder) 900000000000448009 80581000172112 20180315 1 11000172109 111000172100 en 900000000000013009 Monochorionic triamniotic triplet pregnancy 90000000000044800
Rory Davidson (Jan 05 2020 at 06:43):
With regards to the duplicates, the BE descriptions are expected. As you say, these are new descriptions in different languages for the same concept which is an aspect of the SNOMED CT model enabling the unique concept id to be used irrelevant of which language is requested..
Grahame Grieve (Jan 05 2020 at 12:35):
what are you loading from? snapshot?
Jose Costa Teixeira (Jan 05 2020 at 12:45):
yes
Michael Lawley (Jan 06 2020 at 05:34):
So, these are not "duplicates". A SNOMED concept may have many descriptions (akin to a FHIR CodeSystem concept having many designations).
What is the import tool that you're using?
Jose Costa Teixeira (Jan 06 2020 at 05:45):
Grahame's server utils
Rob Hausam (Jan 08 2020 at 03:24):
any further progress on this?
Jose Costa Teixeira (Jan 08 2020 at 10:11):
Hi Rob. We're working on this. We're looking for some materials to demonstrate a terminology server for first timers.
Rory Davidson (Jan 08 2020 at 12:27):
If this helps, we have an open-source terminology server for SNOMED CT which supports the FHIR Terminology Services and manages both editions and extensions and is pretty quick to set up for demonstrations - https://github.com/IHTSDO/snowstorm
Jose Costa Teixeira (Jan 08 2020 at 12:56):
This is interesting, thanks. Perhaps what we have in Belgium is an extension and can be handled as such...? We started some discussion to investigate.
This server seems helpful
Rory Davidson (Jan 08 2020 at 13:45):
Yes, it is an extension in Belgium and should be handled as such. We use that server which hosts the online SNOMED CT browser for Belgium as well so it should work.
Jose Costa Teixeira (Jan 08 2020 at 14:06):
ok we will try.
Jose Costa Teixeira (Jan 08 2020 at 14:07):
Also the snapshot edition of the BE edition (or tools to generate itt) would be very useful.
Grahame Grieve (Jan 09 2020 at 23:30):
the issue I have is duplicate relationships, not additional descriptions. here's some examples:
Grahame Grieve (Jan 09 2020 at 23:45):
in the snapshot for the current BE release:
Grahame Grieve (Jan 09 2020 at 23:45):
107532022 20180915 0 11000172109 116577007 116576003 0 116680003 900000000000011006 900000000000451002
Grahame Grieve (Jan 09 2020 at 23:46):
In the snapshot for international, v 20190731
Grahame Grieve (Jan 09 2020 at 23:47):
107532022 20190131 0 900000000000207008 116577007 116576003 0 116680003 900000000000011006 900000000000451002
Grahame Grieve (Jan 09 2020 at 23:47):
I find 2753 examples of this
Grahame Grieve (Jan 09 2020 at 23:49):
All the entries I have checked just have a different module Id. But that's only a small sampling...
Jose Costa Teixeira (Jan 09 2020 at 23:54):
I could not find the place in the import tool where duplicates are intercepted to log them. I just got a message "Duplicates not allowed"
Grahame Grieve (Jan 09 2020 at 23:54):
I haven't committed that yet.
Jose Costa Teixeira (Jan 09 2020 at 23:54):
ah ok
Jose Costa Teixeira (Jan 09 2020 at 23:55):
in a few hours we will have a phone call with Rory and the BE crew, I think this can wait until then
Grahame Grieve (Jan 09 2020 at 23:56):
my import fails later with some error about refsets and field types. I'm expecting that this is my issue but I haven't had time to investigate yet
Rob Hausam (Jan 10 2020 at 03:28):
Are you getting this from the Full release (not the Snapshot)?
Grahame Grieve (Jan 10 2020 at 04:16):
full release of intl and snapshot of BE
Rob Hausam (Jan 10 2020 at 05:42):
Makes sense. Are you following this algorithm for selecting the component rows for a particular snapshot view? In the full release, for the rows with a particular component (concept, description, relationship, etc.) id, only the one with the most recent effectiveTime that is equal to or less than the specified snapshot time (usually, but not always, the current time) is selected for the snapshot view. With an extension, as in the examples that you gave, if the extension version is based on the international release version that you are using (which it should be - presumably the current BE release is based on the international 20190731 release?), then for the current snapshot view you would still select only the row with the greatest effectiveTime, which in this case would be the row for 107532022 from the international release (with effectiveTime 20190131). The row from the BE release (with effectiveTime 20180915) would be excluded. That may resolve the duplicate? @Rory Davidson can correct me if I said anything wrong!
Rory Davidson (Jan 10 2020 at 10:07):
Yes, @Rob Hausam, that's correct. The most record with the most recent effective date trumps any previous records. This can span between modules as well. The snapshot only shows that most recent effective dated record, whereas the full contains them all.
Grahame Grieve (Jan 10 2020 at 10:49):
ok thanks. I had not seen that link (and still can't, courtesy of Australia's 3rd world internet...). but I'll resolve it that way
Grahame Grieve (Jan 10 2020 at 10:51):
ok thanks. I had not seen that link (and still can't, courtesy of Australia's 3rd world internet...). but I'll resolve it that way
Michael Lawley (Jan 29 2020 at 04:23):
But be careful when crossing modules - If the BE extension for 20180915 depends on the International Edition for 20180731, then that row above (Int module, but later date of 20190131) does NOT trump the BE row.
Last updated: Apr 12 2022 at 19:14 UTC