FHIR Chat · identifier/coding systems and codes · genomics

Stream: genomics

Topic: identifier/coding systems and codes


view this post on Zulip Larry Babb (Nov 20 2019 at 18:56):

Is there a method to how the system and codes are defined for the various codings in the genomics profile components?

I noticed that the HGNC Value Set section the following information is provided...

This value set includes codes from the following code systems:

There is some standards evolving for working with URLs and URIs for coding systems. Identifiers.org Central Registry is a service that seems to be getting quite popular and has a considerable number of the most popular coding and identifier authorities registered. It provides the ability to uniformly represent the precise code systems needed by many of the terms used in HL7. This may be a really nice place to minimally verify that the choices made by hl7 value sets and terminology folks are similar in how they breakdown namespaces and reference the actual resources URL pattern as well as providing an Identifers.org Compact URL. From these two URLs a "system" could be consistently taken by separating out the "code" or "identifier" at the end of the URL.

For example, if you search the registry by typeing in "HGNC" in the search box you will notice that four results return

  • hgnc
  • hgnc.symbol
  • hgnc.family
  • hgnc.genefamily

these four separate and distinct hgnc code systems have there own compact url (or namespace). I believe hgnc.genefamily is equivalent to the "genegroup" in the value set description above and "hgnc" is the one that would align with the geneId.

if you drill into the "hgnc" namespace you can see the details of that registered namespace. At the bottom of the screen is the Primary resource as registered by EBI (in this case).

Here's the section from the bottom of the HGNC registered identifier system

Name HUGO Genome Nomenclature Committee
Description HUGO Genome Nomenclature Committee
URL Pattern https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/{$id}
Home URL https://www.genenames.org
Location United Kingdom
Sample ID (LUI) 2674

If you need to parse out the "system" from the "code" I would take the URL Pattern above and remove the "id" and presume that is the code value.

So you would end up with

If you were to follow through with the other HGNC namespaces for completeness you'd end up with

this central and emerging standardization for capturing coding/identifier systems seems like a great place to provide insights on developing more consistent and potentially reusable systems for many Codings.

BTW www.genenames.org is the home url for the HGNC website. I'm not sure who identified or manufactured the www.genenames.org/geneId and www.genenames.org/genegroup systems, but I would say that they are not super clean in that they may get confused with the actual breakdown in concepts provided by the authorizing agency.

the emerge project intends to use the systems derived from the identifiers.org registry when available. Please let us know if this is inappropriate or invalid. if so, also let us know why so we can determine how to best standardize the values we use for "system"s and "code"s.

view this post on Zulip Jamie Jones (Nov 20 2019 at 19:13):

This is a very valuable resource if we can integrate it. Tagging @Patrick Werner as he created the placeholder urls

view this post on Zulip Kevin Power (Nov 20 2019 at 20:04):

WOW, very nice indeed. Thanks @Larry Babb

view this post on Zulip Patrick Werner (Nov 21 2019 at 10:53):

I also have noticed the Indentifiers.org project some time before, but i'm not sure if it is useful for. I think the compact identifier is useful for some non FHIR/ less structured approaches to have a notation of hgnc.family:PADIfor v2 or other less structured standards. In FHIR we don't need such a compact identifier. We have a Codings which take care of coded values with canonical urls, ValueSets etc.

What concerns me of Identifiers.org is that they are duplicating parts of other terminologies and aren't always in sync. E.g. HGNC gene families are called groups for some time now, Identifier.org still refers to them as families.
Identifier.org also uses only the symbol of gene groups: hgnc.family:PADI, which originally is the approved Symbol of the Gene Group: Peptidyl arginine deiminases (PADI) which has the HGNC group id: 677 (https://www.genenames.org/data/genegroup/#!/group/677)

view this post on Zulip Patrick Werner (Nov 21 2019 at 10:57):

I also don't like them splitting up HGNC Symbol and Gene ID as they are attributes of the same CodeSystem containg Id, Name, approved Symbol and many properties (like synonyms) more.

view this post on Zulip Patrick Werner (Nov 21 2019 at 10:59):

I created http://www.genenames.org/geneId and http://www.genenames.org/genegroup as canonical urls to be used in FHIR. The namespace of a canonical url should point to the Codesystem creator and are just canonical urls. It would be very unusual to have https://www.genenames.org/cgi-bin/gene_symbol_report?match=as a system uri.

view this post on Zulip Patrick Werner (Nov 21 2019 at 11:05):

What we could do (would be also unusual) to have invariants on code of required Codings to check the code against these regexes. But with FHIR this should be handled by a Terminology Server

view this post on Zulip Kevin Power (Nov 21 2019 at 14:46):

Well, very good analysis @Patrick Werner -- thanks for digging in. I know that @ Bob Milius had facilitated some conversations on this topic, perhaps he might have a comment as well?


Last updated: Apr 12 2022 at 19:14 UTC