Stream: genomics
Topic: gene coding without the id
Andrew Patterson (Apr 12 2018 at 01:21):
We have a source genomics reporting system that we are normalising into v2 and FHIR reports. In an ideal world we would be reporting genes as 21497^ACAD9^HGNC-Symb or equivalent FHIR coding with http://genenames.org, 21947, "ACAD9".
Only problem - the source system has treated the gene symbols 'ACAD9' as the code. So the gene id is long gone. Does anyone have any thoughts as to the best way to work around this?
Is the reverse lookup symbol->id a safe one (I guess I can do that locally).
Or is there a loinc observation of gene that is not gene studied[id] - but gene studied[symbol].
Or should I just represent the FHIR coding as a coding with no code - just a display name?
It seems like this might be common problem - as even systems doing the right(ish) thing might have treated gene symbol as the code for HGNC and not stored the id
Lloyd McKenzie (Apr 12 2018 at 03:36):
Conveying just the display with no code is legitimate, though obviously not ideal. I don't know if a reverse lookup is safe with that system or not.
Andrew Patterson (Apr 12 2018 at 03:38):
It really want to put the 'system' AND the 'display' - but with no code.. because I really want to say that all the display values come from the HGNC code system display values. But I am guessing that that is illegal
Lloyd McKenzie (Apr 12 2018 at 03:43):
That's legal in FHIR - though it might not be accepted by all systems. And it can't be treated as "computable"
Grahame Grieve (Apr 12 2018 at 04:57):
right it's not illegal, but the implications of having a system + a display but not code are not documented and might cause confusion. So proceed with caution
Grahame Grieve (Apr 12 2018 at 04:58):
as for reverse lookup - is there any display associated with more than one code? There should not be
Andrew Patterson (Apr 12 2018 at 05:01):
I'll investigate some more whether I can retrieve the id myself given the symbol. I believe the only issue with the reverse lookup will be historical - at any point in time release, each HGNC id has exactly one display symbol - but when the retire/rename symbols I'm not sure whether they expire the id or not. I can't see how display names could be associated with more than one current code
Grahame Grieve (Apr 12 2018 at 05:03):
y. so someone might have built a history table... but at some point, it's more work than justified
Andrew Patterson (Apr 12 2018 at 05:05):
If I do put it in a just the symbol (as the display) - the downstream users will treat it as computable (no matter what I tell them)
Andrew Patterson (Apr 12 2018 at 05:07):
I can make a strong argument about not matching on a display string like 'breast cancer gene 1'.. it is harder for me to make the same argument about something that is actually a code itself like 'BRCA1'. I will try though.
Kevin Power (Apr 12 2018 at 13:57):
Unfortunately, not all our members with HGNC experience follow Zulip. You will likely get additional responses on questions like the safety in doing symbol->gene ID lookups if you ask this question on our list serv email (clingenomics@lists.hl7.org).
Robert McClure (Apr 12 2018 at 13:59):
@Kevin Power I see two issues here - what you can do with your unusual situation, and what your unusual situation might implicate for others regarding what "FHIR allows and encourages." You've already heard that a display and system is not illegal but is strongly discouraged. Granted many others will not ask and simply do this and there is nothing we can do to stop that , but since you've asked, I'd strongly suggest you fix the data before you put it into an exchange format (ie: FHIR.) You've already indicated it should/might be possible - I'd certainly hope and assume it would be. If it really is a code system, then there should be a source that would support getting a code for each display, right? I for one would strongly suggest you fix anything you have. I don't think that is improper or asserting knowledge - if not you, it will be someone else less capable!
Andrew Patterson (Apr 12 2018 at 23:03):
I think HGNC is a bit unusual - the display value is itself also a valid code - it follows all the rules for codes in code systems (unique, unchanging, concept permanence etc).
If a programmer was storing just the display text of 'diabetes mellitus' in their database rather than the SNOMED id - I would feel justified at yelling at them.
If someone is just storing HGNC 'symbols' as codes, I'm not sure how good my argument is that they are wrong. So I will look to get the ids from a reverse mapping - but just flagging that I think this is a code system that might cause issues in practice.
Kevin Power (Apr 12 2018 at 23:25):
FWIW - I think it is a pretty common to use the symbol as the 'code'. So it's not just your system :)
Last updated: Apr 12 2022 at 19:14 UTC