Stream: implementers
Topic: Variant nomenclature validation
Max Masnick (Aug 06 2020 at 18:18):
Hi...I'm hitting an issue with variant nomenclature not validating. Using an example from mCODE, which I understand used to validate, I get this error:
None of the codes provided are in the value set http://hl7.org/fhir/us/mcode/ValueSet/mcode-hgvs-vs (http://hl7.org/fhir/us/mcode/ValueSet/mcode-hgvs-vs), and a code from this value set is required) (codes = http://varnomen.hgvs.org#NC_000019.8:g.1171707G>A)
The full output from the validator .jar, including version numbers of everything is here: https://gist.github.com/masnick/30bf1eb03cc13ce68e6d4f06c2a52e3d. Any help would be appreciated. Thanks!
Mark Kramer (Aug 10 2020 at 12:50):
@Max Masnick I think this is simply the fact that the terminology server used by the validator doesn't know or have access to the code system http://varnomen.hgvs.org. To make matters even less clear, HGVS is more a naming algorithm than a finite set of codes. To even begin to solve this, we would have to have an algorithm to validate HGVS names, and have a way to embed the algorithm within the terminology server (that currently only stores lists of codes).
May Terry (Aug 10 2020 at 13:18):
Seems like a couple of different issues: 1) modeling, 2) fhir validator tooling. 1) for modeling, there are a number of use cases where codeableConcept references not necessarily a fully defined terminology of codes and terms but a naming system with a binding that's extensible
- for example HGVS as found in the Genomics Reporting IG and mCODE IG. 2) if the validator points to a terminology server which doesn't handle naming systems and the validator expects a match of a code, then how should the validator handle this? Should it remain an explainable error, or could we loosen this message to a warning level?
We had earlier versions of the FHIR validator that didn't flag an error on this and I'm concerned that it will pose a challenge if we release a technical errata and it gets rejected because of this discrepancy. cc: @Patrick Werner since I think this affects both mCODE and the Genomics Reporting IG.
Patrick Werner (Aug 10 2020 at 13:22):
Good explanation, yes hgvs is a grammar like ucum. This can be expressed as a CS.
Patrick Werner (Aug 10 2020 at 13:27):
If the terminology backend doesn't know the CS there are three choices:
- give a info/warning that this couldn't be validated.
- throw an error as it couldn't be validated.
- ignore the validation for this CS
This is up to the server implementer.
May Terry (Aug 10 2020 at 13:29):
Agreed. I guess I'm trying to ensure that we don't get dinged on technical errata releases because the examples are now throwing an error and failing validation.
Patrick Werner (Aug 10 2020 at 13:32):
I think the validator shouldn't throw errors on missing CS, or have this being configurable. There are plenty of usecases where no complete CS/Terminology Servers are available.
Grahame Grieve (Aug 10 2020 at 19:21):
hmm. I'll investigate. On the whole I agree for this case. I'm not sure about some other cases that present the same to the validator
Grahame Grieve (Aug 10 2020 at 20:53):
what's the source being validated above?
Max Masnick (Aug 12 2020 at 10:58):
@Grahame Grieve the file I was trying to validate is http://hl7.org/fhir/us/mcode/Observation-mCODECancerGeneticVariantExample01.xml. Thanks for looking into this!
Grahame Grieve (Aug 12 2020 at 11:07):
if you were a non-fhir user, and you wanted to know whether an hgvs code was valid, what would you do?
Grahame Grieve (Aug 12 2020 at 11:09):
https://variantvalidator.org/service/validate/ thinks it's not valid...
Max Masnick (Aug 12 2020 at 11:18):
One way to do it is to look at the variant on ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/variation/619728/) based on the ClinVar code in the instance, and I would assume that anything listed under HGVS would be valid.
In this case NC_000019.8:g.1171707G>A is not listed on ClinVar so maybe that's a bad example (and should be changed in the example instance).
However, I did hit the same error with this file here, which is using a HGVS value from ClinVar that validates with variantvalidator.org: https://gist.githubusercontent.com/masnick/30bf1eb03cc13ce68e6d4f06c2a52e3d/raw/1fcd9e653e6b5f20a615cc19068b93c40bf9de5d/another_example.json
Max Masnick (Aug 12 2020 at 11:20):
@Grahame Grieve here's the ClinVar link for reference: https://www.ncbi.nlm.nih.gov/clinvar/variation/128144/
The variant is NC_000016.10:g.23603471G>T
Max Masnick (Aug 12 2020 at 11:29):
This is the error I get with another_example.json:
Error @ Observation.component[2].value.ofType(CodeableConcept) (line 107, col28) : None of the codes provided are in the value set http://hl7.org/fhir/us/mcode/ValueSet/mcode-hgvs-vs (http://hl7.org/fhir/us/mcode/ValueSet/mcode-hgvs-vs), and a code from this value set is required) (codes = http://varnomen.hgvs.org#NC_000016.10:g.23603471G>T)
Kevin Power (Aug 12 2020 at 14:51):
@Max Masnick - We see this sort of error in the Genomics Reporting IG quite often for HGVS and other code systems that are not yet supported. For now, we have ignored the errors, but would love to do something to get these code systems supported. I know @Grahame Grieve is supportive of that, but no one has had the time to make that happen just yet. As @Patrick Werner mentioned above, perhaps the validator should not be treating something like this as an error, or perhaps it could be suppressed by the IG?
This might be more than you really want, but the CG group (#genomics ) started cataloging a variety of tools that will perform this sort of HGVS validation, and you can see that list here.
There are a few details in that document, but not near enough. Suffice it to say that each of these tools have limitations. We can broadly categorize into two parts:
1 - Syntax, which can be done here and is mostly straightforward.
2 - Semantics (do the parts make sense together - is the ref seq + version valid, is the ref allele indicated accurate, etc..), which to do well is difficult, and is where differences emerge between the various tools.
If we were to start somewhere, updating the validator to perform syntax validation for HGVS would be a nice start. Much beyond that will be hard.
Grahame Grieve (Aug 13 2020 at 00:02):
ok, so from the next release of the IG Publisher, the error will change to the warning, but you won't notice because also HGVS codes will be validated by the multilyzer, though of course it only checks syntax, not semantics. If any one finds a trustworthy semantic checker, let me know
Max Masnick (Aug 13 2020 at 00:07):
Thanks all! @Grahame Grieve, does the change to the IG Publisher also apply to the FHIR Validator? Or is that a separate codebase?
Grahame Grieve (Aug 13 2020 at 00:08):
same code base. in fact, all the changes were in the validator
Last updated: Apr 12 2022 at 19:14 UTC