FHIR Chat · OMOP Vocab Concept Persistence

Stream: terminology

Topic: OMOP Vocab Concept Persistence

Davera Gabriel (Mar 21 2022 at 14:30):

Hello @Michael Lawley @Grahame Grieve indicated in a meeting last week you have discovered instances where the OMOP Vocabulary content did not seem to adhere to concept persistence. It is intended to. Could you please share an example or 2 (which hopefully also indicates how you discovered this?) There is an OHDSI Vocabulary WG meeting tomorrow 3/22 noon EASTERN & this may be a good topic for that agenda, if its not too full. @Christian Reich Thanks in advance for your help!

Michael Lawley (Mar 22 2022 at 03:38):

Hi @Davera Gabriel I think this might be a reference to my reading of the code indicating that the assignment of OMOP concept ids to codes is done in a sequential manner as the codes are imported. This means that 1. this assignment is dependent on the order that the vocabulary sets are imported, and 2. even if the order is preserved, if one set changes in size (eg a new version), then all subsequent ids will change.

Now, if there's a single instance of Athena being used (such as https://athena.ohdsi.org), and it's never reset, but only ever added to, then the ids will be stable for all users of that instance, but if anyone is running their own version then they're likely to get different ids.

Section 2.3 of this report https://aehrc.csiro.au/wp-content/uploads/2021/11/CSIRO-FHIR-OMOP-Terminology-Report.pdf indicates several options for rectifying this issue.

Davera Gabriel (Mar 27 2022 at 20:06):

Michael Lawley said:

Hi Davera Gabriel I think this might be a reference to my reading of the code indicating that the assignment of OMOP concept ids to codes is done in a sequential manner as the codes are imported. This means that 1. this assignment is dependent on the order that the vocabulary sets are imported, and 2. even if the order is preserved, if one set changes in size (eg a new version), then all subsequent ids will change.

Now, if there's a single instance of Athena being used (such as https://athena.ohdsi.org), and it's never reset, but only ever added to, then the ids will be stable for all users of that instance, but if anyone is running their own version then they're likely to get different ids.

Section 2.3 of this report https://aehrc.csiro.au/wp-content/uploads/2021/11/CSIRO-FHIR-OMOP-Terminology-Report.pdf indicates several options for rectifying this issue.

Hello @Michael Lawley The Athena content is refreshed and not just added to: so you are correct about those mechanics, but they do adhere to concept persistence as a principle. We've been working with the OHDSI vocab group to obtain older versions to do some mapping validation. Re: you observation - thank you for the explanation. The OHDSI community implementations are quite diverse and I am doubtful they synchronize their vocab updates universally, so if the concept IDS do change then they would not be able to exchange data: a primary driver for the growth of network. I will follow-up with the OHDSI CDM Vocab WG: meeting 2nd & 4th Tuesdays at noon EASTERN here
Thanks again!
D

Craig McClendon (Mar 28 2022 at 15:34):

Semi-related. We had a case where we were adding other vocabularies into the OMOP/Athena schema.
To deal with the changing sequences we used a negative integer sequence for data we inserted to logically separate it from OMOP-"curated" data. This gave us a path to refresh the Athena data at a later point while preserving our data. It's a tad hacky but it worked.

Michael Lawley (Mar 28 2022 at 21:26):

A hash of system + code (and + version for those terminologies that don't support concept permanence; ICD I'm looking at you) would be a simple mechanism to fix this (short of actually using the native code-system and code pairs instead of a proxy id).

Michael Kallfelz (Apr 06 2022 at 06:34):

The OMOP standardized vocabularies intentionally reserved a number range of 2 billion for local concept IDs. It is the convention to create your own concepts for local vocabularies in this number range. There can only be one Athena (and one vocabulary development server generating the IDs at this time) so concept IDs are stable. There have been early instances of reuse of concept IDs after the original ones were completely deleted as they were created in error, but other than that, concepts are not deleted but become invalid and stay in the vocabulary keeping their concept ID.

Michael Lawley (Apr 07 2022 at 08:32):

@Michael Kallfelz what do you mean by "there can only be one Athena"?

Michael Kallfelz (Apr 07 2022 at 11:19):

this means, while there is a github repository for Athena (or even two: backend & frontend) it does not really make a lot of sense to stand up your own instance locally. There aren't a lot of use cases I could think of for doing this... But please, let me know about use cases that you would see here. Athena at athena.ohdsi.org is the one stop place for your OHDSI standardized vocabulary needs. Regarding the concept IDs (and their persistence) it is maybe even more important to understand that the currently supported setup is also one vocabulary server that feeds the refreshed vocabularies to Athena. Yes, there is the argument of a single point of failure. But there is also the argument of controlled and curated maintenance of the vocabularies whereas a federated approach with multiple servers would require considerable overhead for negotiating updates... Does that make sense?

Davera Gabriel (Apr 07 2022 at 14:28):

@Michael Lawley are you possibly seeing this phenomenon on a separate instance of Athena?

Josh Mandel (Apr 07 2022 at 14:34):

I think Michael was pointing out a general property of the current Athena design rather than any specific observed phenomenon. I don't think there's any debate about whether the design works this way; rather, there's discussion about whether this Is the optimal point in trade-off space between centralized control and the ability for external subcommunities to augment their data sets in consistent ways.

Davera Gabriel (Apr 07 2022 at 14:45):

Josh Mandel said:

I think Michael was pointing out a general property of the current Athena design rather than any specific observed phenomenon. I don't think there's any debate about whether the design works this way; rather, there's discussion about whether this Is the optimal point in trade-off space between centralized control and the ability for external subcommunities to augment their data sets in consistent ways.

FYI @Michael Kallfelz aka: "Mik" leads the OHDSI CDM Vocab WG and is a principal PoC (in addition to @Christian Reich ) for all things OMOP Vocab. I have found these fellows to be great collaborators and quite attentive to The Community's needs - which now includes all of you (HL7 membership at-large) :)

Josh Mandel (Apr 07 2022 at 14:49):

The Community's needs - which now includes all of you (HL7 membership at-large) :)

Just so! I wasn't trying to describe the HL7FHIR community at large as an "external subcommunity" here -- rather I'm talking about inherent design properties that allow anyone off the street to pick up a technology and start extending in their own ways.

Davera Gabriel (Apr 07 2022 at 17:03):

Josh Mandel said:

The Community's needs - which now includes all of you (HL7 membership at-large) :)

Just so! I wasn't trying to describe the HL7FHIR community at large as an "external subcommunity" here -- rather I'm talking about inherent design properties that allow anyone off the street to pick up a technology and start extending in their own ways.

Of course @Josh Mandel ! ! That comment was aimed to provide context & endorsement of the individuals mentioned more than any technology approaches. The OHDSI community embraces open-source, -science principles. I think they will be quite receptive to suggestions as to how they might further these commitments. Please do carry on with these kinds of observations :)

Michael Lawley (Apr 07 2022 at 20:18):

@Josh Mandel is correct - I was talking about properties of the current design. I believe it's very likely that others "off the street" will have, as Josh says, picked it up and deployed their own.
As to why @Michael Kallfelz , I could see wanting to use SNOMED CT-AU which contains AMT as one reason. Another might be needing to work with proprietary codes that can't be exposed in a public server (I am guessing here, please correct me if I'm off-base and there are solutions to these issues).

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · OMOP Vocab Concept Persistence · terminology

Stream: terminology

Topic: OMOP Vocab Concept Persistence

Davera Gabriel (Mar 21 2022 at 14:30):

Michael Lawley (Mar 22 2022 at 03:38):

Davera Gabriel (Mar 27 2022 at 20:06):

Craig McClendon (Mar 28 2022 at 15:34):

Michael Lawley (Mar 28 2022 at 21:26):

Michael Kallfelz (Apr 06 2022 at 06:34):

Michael Lawley (Apr 07 2022 at 08:32):

Michael Kallfelz (Apr 07 2022 at 11:19):

Davera Gabriel (Apr 07 2022 at 14:28):

Josh Mandel (Apr 07 2022 at 14:34):

Davera Gabriel (Apr 07 2022 at 14:45):

Josh Mandel (Apr 07 2022 at 14:49):

Davera Gabriel (Apr 07 2022 at 17:03):

Michael Lawley (Apr 07 2022 at 20:18):