Stream: genomics
Topic: identifier/coding systems and codes
Larry Babb (Nov 20 2019 at 18:56):
Is there a method to how the system and codes are defined for the various codings in the genomics profile components?
I noticed that the HGNC Value Set section the following information is provided...
This value set includes codes from the following code systems:
- Include all codes defined in http://www.genenames.org/geneId
- Include all codes defined in http://www.genenames.org/genegroup
There is some standards evolving for working with URLs and URIs for coding systems. Identifiers.org Central Registry is a service that seems to be getting quite popular and has a considerable number of the most popular coding and identifier authorities registered. It provides the ability to uniformly represent the precise code systems needed by many of the terms used in HL7. This may be a really nice place to minimally verify that the choices made by hl7 value sets and terminology folks are similar in how they breakdown namespaces and reference the actual resources URL pattern as well as providing an Identifers.org Compact URL. From these two URLs a "system" could be consistently taken by separating out the "code" or "identifier" at the end of the URL.
For example, if you search the registry by typeing in "HGNC" in the search box you will notice that four results return
- hgnc
- hgnc.symbol
- hgnc.family
- hgnc.genefamily
these four separate and distinct hgnc code systems have there own compact url (or namespace). I believe hgnc.genefamily is equivalent to the "genegroup" in the value set description above and "hgnc" is the one that would align with the geneId.
if you drill into the "hgnc" namespace you can see the details of that registered namespace. At the bottom of the screen is the Primary resource as registered by EBI (in this case).
Here's the section from the bottom of the HGNC registered identifier system
Name HUGO Genome Nomenclature Committee
Description HUGO Genome Nomenclature Committee
URL Pattern https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/{$id}
Home URL https://www.genenames.org
Location United Kingdom
Sample ID (LUI) 2674
If you need to parse out the "system" from the "code" I would take the URL Pattern above and remove the "id" and presume that is the code value.
So you would end up with
- HGNC gene id (hgnc)
- system = https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/
- code (ex.) = 2674 . (you could even reference the regex expression used to validate content here ^\d{1,5}$, might be nice for implementers)
If you were to follow through with the other HGNC namespaces for completeness you'd end up with
- HGNC gene symbol (hgnc.symbol)
- system = https://www.genenames.org/cgi-bin/gene_symbol_report?match=
- code (ex.) = DAPK1 . (validation regex ^[A-Za-z-0-9_]+(\@)?$)
- HGNC family (hgnc.family)
- system = https://www.genenames.org/genefamilies/
- code (ex.) = PADI . (validation regex ^[A-Z0-9-]+(#[A-Z0-9-]+)?$)
- HGNC gene family
- system = https://www.genenames.org/cgi-bin/genefamilies/set/
- code (ex.) = 141 . (validation regex ^\d+$)
this central and emerging standardization for capturing coding/identifier systems seems like a great place to provide insights on developing more consistent and potentially reusable systems for many Codings.
BTW www.genenames.org is the home url for the HGNC website. I'm not sure who identified or manufactured the www.genenames.org/geneId and www.genenames.org/genegroup systems, but I would say that they are not super clean in that they may get confused with the actual breakdown in concepts provided by the authorizing agency.
the emerge project intends to use the systems derived from the identifiers.org registry when available. Please let us know if this is inappropriate or invalid. if so, also let us know why so we can determine how to best standardize the values we use for "system"s and "code"s.
Jamie Jones (Nov 20 2019 at 19:13):
This is a very valuable resource if we can integrate it. Tagging @Patrick Werner as he created the placeholder urls
Kevin Power (Nov 20 2019 at 20:04):
WOW, very nice indeed. Thanks @Larry Babb
Patrick Werner (Nov 21 2019 at 10:53):
I also have noticed the Indentifiers.org project some time before, but i'm not sure if it is useful for. I think the compact identifier is useful for some non FHIR/ less structured approaches to have a notation of hgnc.family:PADI
for v2 or other less structured standards. In FHIR we don't need such a compact identifier. We have a Codings which take care of coded values with canonical urls, ValueSets etc.
What concerns me of Identifiers.org is that they are duplicating parts of other terminologies and aren't always in sync. E.g. HGNC gene families are called groups for some time now, Identifier.org still refers to them as families.
Identifier.org also uses only the symbol of gene groups: hgnc.family:PADI, which originally is the approved Symbol of the Gene Group: Peptidyl arginine deiminases (PADI) which has the HGNC group id: 677 (https://www.genenames.org/data/genegroup/#!/group/677)
Patrick Werner (Nov 21 2019 at 10:57):
I also don't like them splitting up HGNC Symbol and Gene ID as they are attributes of the same CodeSystem containg Id, Name, approved Symbol and many properties (like synonyms) more.
Patrick Werner (Nov 21 2019 at 10:59):
I created http://www.genenames.org/geneId and http://www.genenames.org/genegroup as canonical urls to be used in FHIR. The namespace of a canonical url should point to the Codesystem creator and are just canonical urls. It would be very unusual to have https://www.genenames.org/cgi-bin/gene_symbol_report?match=
as a system uri.
Patrick Werner (Nov 21 2019 at 11:05):
What we could do (would be also unusual) to have invariants on code of required Codings to check the code against these regexes. But with FHIR this should be handled by a Terminology Server
Kevin Power (Nov 21 2019 at 14:46):
Well, very good analysis @Patrick Werner -- thanks for digging in. I know that @ Bob Milius had facilitated some conversations on this topic, perhaps he might have a comment as well?
Last updated: Apr 12 2022 at 19:14 UTC