Stream: genomics
Topic: hgvs component(s)
Larry Babb (Oct 15 2019 at 23:50):
HGVS is undoubtedly important to share in these messages in regards to variant observations. However, it is not crystal clear how the CG IG is scoping what is considered a valid or proper hgvs expression. It is my reading of the HGVS specification that all HGVS expressions should (or actually MUST?) include the reference sequence in the form <ref seq>:<position+change expression>. Certainly, the more interesting p., c., g., n., m. portion of an HGVS expression is fairly useless without the reference sequence on which it is based, at least in terms of sharing structured and computationally useful variants.
It seems like the CG IG is essentially parsing out the hgvs components like ref seq/accessions and in the case of genomic data the start, end, ref and alt. Doing so makes it easy to ignore situations whereby someone does not pass a complete and proper HGVS expression in that dedicated component, but this is an interesting and confusing aspect of the guide IMO. If the guide is trying to be flexible in allowing users to share whatever form of HGVS producers have then it should be clarified that it is purely informational and not a standard for representing the variation that it presumably represents. By not clearly stating this it can lead implementers to presume that HGVS in "any" form is potentially a legitimate means for sharing variation using the CG IG. Folks believe this now and without stating this explicitly will continue to foster this dangerous practice.
Additionally, if the CG IG spec does want to parse out and pass the structured bits of the HGVS expression like is done for genomic dna, and mostly for transcript dna, then it should also do so for amino acid (I assume). Currently, there is not amino-acid dna ref seq, even though there is one for transcript and genomic ref seqs. Also, if one was to use the amino acid component there should be a separate indicator to help the consuming system know whether or not the passed AA change is a one-letter or three-letter expression. While this seems obvious it is not necessarily something that can be easily determined by simply inspecting the data (even though it seems so).
Also, since the CG IG does need to pass hgvs representations, then you may want to consider providing a method for adding multiple transcript and amino acid representations (and their corresponding parsed elements, if that is the approach that is chosen in general). Many genomic variants will have multiple transcript forms, many of which may derive different amino acid changes. Being able to only express one line of genomic-dna->trx-dna->aa changes is flexible in the sense that it allows you to express several forms of a variant, but it is also constrained in that it does not allow you to express several forms of the transcript that the genomic dna may map to (and the corresponding aa changes). It seems like an arbitrary constraint or flexibility depending on your perspective.
It would be helpful to understand if there's a method to how the CG IG is ultimately planning to handle the sharing of hgvs standard forms and whether it will allow multiple and complete representation if desired. Or will it (should it) only provide one of the forms in a message at a time so as to avoid all the complexity and thus requiring implementers to pick a form that best represents what was actually measured. That would be my vote.
Larry Babb (Oct 16 2019 at 00:16):
a couple more questions...
- Is it permissible to send the hgvs values with the accession/ref-seq portion?
- since there is only a gHGVS, cHGVS and pHGVS component, then how would someone send m., n. r. hgvs expressions?
Larry Babb (Oct 16 2019 at 01:31):
also, are the following components linked to genomic variant representation or can they be applied to cDNA, aa representations?
- coordinate system
- exact-start-end
- inner/outer-start-end
- ref-allele
- alt-allele
If any or all of these can be used for any type of reference sequence basis then how does one convey that in the data being passed to the consumer? If these are all fixed to genomic data then is it presumed that the genomic ref seq and/or the genomic-build + chromosome are required in order to understand which reference sequence these components are dependent on?
Larry Babb (Oct 16 2019 at 11:54):
also, related...
- the component "dna-change-type" seems to align - namewise - with the "dna-change" component. The dna-change component is constrained to the cHGVS expression.
So, if you don't have a cHGVS expression can you still have a "dna-change-type" component and would it work equally to describe the dna-change type regardless if it was tied to a genomic-dna-change or a coding dna-change?
I'm guessing that the dna-change-type can always work for either genomic-dna-change (gHGVS) or dna-change (cHGVS) because these two should always have the same type regardless of whether it is a cDNA or genomic representation of the same change. Please confirm that this assumption is correct. (It may be worth noting in the IG under the dna-change_type component so that folks clearly understand that it is about the DNA change regardless if it is on a genomic, mito or transcript (NM) ref seq, since all of these actually represent DNA sequences.)
Jamie Jones (Oct 16 2019 at 13:09):
My understanding is that we are hoping to align variant more with the GA4GH variant representation and our own IM modelling work in the near future. I agree that our lack of details around the intended usage of HGVS in its various forms is currently misleading and confusing... Is it treated mainly as a way for labs currently only sending hgvs to use our spec?
Larry Babb (Oct 16 2019 at 13:16):
Thanks @James Jones that is helpful to know. If it could be added to the current version of the spec so early adopters and reviewers understand the intention and volatility of this area of the spec. I assume this is a general understanding of the group, if so, stating it clearly in the spec on the various profiles would be helpful.
Kevin Power (Oct 16 2019 at 13:27):
I am pretty swamped today, and will have to admit I only did a quick review - but a few thoughts on HGVS:
-
While we have HGVS CodeSystem and ValueSet, there is nothing in place today to validate what is being sent as the string. Perhaps at some point, we can work with Grahame to do that. And even then, most if not all systems will likely perform their own validation in whatever way they chose, or chose not to perform validation because labs today send a variety of things as HGVS. I wish we could get everyone to follow every standard perfectly, but it simply won't happen.
** And yes, we can and should do what we can to move labs in that direction. I think having a common spec for them to use with some structure to it is a great start. Then we can evolve it. -
Maybe I am reading to much into the comment about "the IG does need to pass hgvs representations" - none of the HGVS components are required, so not sure what that means.
- I think the whole spec is a bit 'volatile' right now, and I am not sure this area is particularly more so than others. However, if you have any recommendations for additional guidance we can provide, we will start considering trackers for the next release of the IG, so we would welcome some in this area.
Bret H (Oct 16 2019 at 13:55):
@Kevin Power +1
I think it would be difficult to enforce without providing an HGVS parser to really ensure proper use of HGVS. Today, labs often send incomplete HGVS strings in their PDF version of reports. HGVS is a convention like ISCN and the GL strings that are used in HLA typing. I do not think the CG WG can easily replace due diligence on the part of the implementation community and recipients of reports. Locally, I suppose our EMR vendor could be asked to build an HGVS validation parser...but labs would want to ensure data integrity so that data will be used appropriately by the downstream (recipient) system.
aside: some group might suggest a single field for HGVS. The multiple fields in our current IG for HGVS make it easier for a recipient system to know that the HGVS string is c, g, etc....without having to parse the HGVS string. It is a convention to aid recipient systems. The roots of this convention are historical and related to alignment with the HL7v2 LRI.
Bob Dolin (Oct 16 2019 at 23:07):
I might also suggest that we consider a new component, for the SPDI representation of the variant. For variants with known breakpoints, I am finding SPDI to be incredibly useful - easy to validate, can be translated into a canonical representation, will soon be supported by ClinVar, etc.
Larry Babb (Oct 17 2019 at 15:01):
my recommendation is to NOT treat the hgvs as a codeableconcept at all, but instead as a string (as Kevin stated). I agree with all the statements above about the inability to control how folks will use hgvs and apply it and thus a "string" is the best we can (and should) do for now.
Following @Bret H 's aside above on a single field. I couldn't agree more. One component for HGVS that allows multiple instances (0..*) . Then compose the component to have a qualifier that would allow the sender and reciever to understand the "kind or type" of hgvs (g, m, n, c, p, r, etc..).
It feels now that by making the hgvs a "coding" with a system, code, display structure that it is a kind of real "code" or "identifier" which is where we seem to stratal the line between a controlled standard representation (format) and a string that we can't dictate the structure of.
I'm not sure if the components can provide additional attributes to help qualify the contents of the component but if possible then it may be a way to bundle this whole concern into a single component which we can later evolve into something more formal. I feel like this issue around the use of HGVS will continue to bog us down otherwise.
Here's the concept...
Under "Variant Profile"
replace components dna-change, genomic-dna-change and amino-acid-dna-change with...
component:hgvs-representation . (cardinality 0..*)
code . CodeableConcept 1..1
value[x] Text 1..1
extension or modifierExtension . (not sure if this is really acceptable - but we could add one simple extension or modifierExtension to provide additional qualifiers about the hgvs expression - like whether it is genomic, rna, protein, non-coding, etc.. We could also add things like whether it is a single-letter or three-letter amino acid version, or even the version of hgvs that the producer used, etc...)
I think this would be a good way to consolidate the issue into a single component while keeping it flexible enough to allow folks to use it in any of the forms they currently do now. It will also avoid the perception that these current HGVS components are more formal that they are really designed to be.
Larry Babb (Oct 17 2019 at 15:05):
I too support the notion of including SPDI representation somehow like @Bob Dolin . But I am concerned about component-creep in these profiles. It may be time to step back and consider the various classes of "representation" versus "identifiers" . (dbsnp, spdi, hgvs, cosmic, clinvar, etc...) and see if we can bucket them into a small number of component classes without creating specialized components for every case now or in the future. Again, not sure if the component structure is amenable to supporting "modifiers" or "qualifiers" to bundle kinds of components together under a single specification type.
Larry Babb (Oct 17 2019 at 15:08):
Also, when/if there is some re-org around the hgvs components, we should consider the fact that haplotypes and genotypes can be expressed with hgvs representation as well. I do not see how this can be done under the haplotype and genotype profiles currently. Again, maybe this is an area of the hgvs spec we feel is not really used by the community and thus does not need to be included in the standard. If it is an oversite, then we should consider how an hgvs component can be used for those profiles as well.
Lloyd McKenzie (Oct 17 2019 at 15:16):
If an HGVS is received, how do we expect it to be used? Would anyone ever try to compute against it or is it really for human rendering and possible look-up?
Larry Babb (Oct 17 2019 at 15:22):
I don't know of any clinical application that would compute against it. If so, I think it would raise serious concerns. It is primarily for human rendering. Unfortunately, this is a very challenging area of this domain, as it is a representation that looks and feels like it can be used for computational purposes. So people do use it that way (hopefully in research settings only). But that said, there's a reckoning that will be coming soon to settle this matter once and for all, hopefully. I think HL7 CG group is the place to put a stake in the ground on this matter. Erring on the side of caution of course.
If there are others out there that consider HGVS a viable clinical computationally useful form then they should chime in.
I should add, that hgvs, when adhered to and validated can be used computationally to the extent any other nomenclature can. Do we have other examples of nomenclatures that are used for clinical computational practices? That may help shed light on how, what and where it should be used.
Kevin Power (Oct 17 2019 at 15:58):
We did have some talks about whether this should be a 'string' or CodeableConcept. We landed on CC with the thinking that it would be treated somewhat like UCUM, and (hopefully) move everyone towards being more consistent in the sending of the data, even with the recognition that many may still treat it like 'text' at first. We did the same thing with the ISCN Karyotype string. Will be curious if we get feedback on that approach from the community.
Lloyd McKenzie (Oct 17 2019 at 16:45):
UCUM is post coordinating, but there's strict validation rules. You can always validate that a code is valid and determine what it means. Is that the case here?
Kevin Power (Oct 17 2019 at 17:00):
There are some fairly strict rules (dig around http://varnomen.hgvs.org/ if you want), and there are some HGVS validators out there. However, not all producers of the data (actually, probably few of them) will consistently produce values that will meet all the rules.
Kevin Power (Oct 17 2019 at 17:01):
The hope is that using CC nudges producers of data in the right direction.
Lloyd McKenzie (Oct 17 2019 at 17:45):
Invalid codes are possible in most code systems. That just means systems are non-conformant. We can highlight that it's a known problem consumers should watch for, but that doesn't mean it's not a 'code'.
Larry Babb (Oct 17 2019 at 19:20):
thank you for the quick turnaround on this.
As a community implementer, here's my takeaway from this conversation. Please correct me if I'm mistaken.
When sharing HGVS representations of variants (observed or derived) I can simply put what we normally would use on reports in the code value. Even if it isn't always consistent.
And, if in the future I need to rely on the hgvs information for computational purposes then I should coordinate that with the various systems I am sharing data with to make sure we are all coding things to some commonly accepted practices for a given version of the HGVS recommendations.
Bret H (Oct 21 2019 at 22:11):
@Larry Babb and @Bob Dolin sounds like some good geforge tracker statements. regards SPDI and regards single component. But both would require implementors to follow governance practices outside HL7. Which is okay but I did want to mention it. ..at any rate, if you do make gforge items, please add links here.
Bret H (Oct 21 2019 at 22:18):
@Larry Babb @Kevin Power @James Jones but I would add that you should indicate the reference etc... in other words, try to send good HGVS, not incomplete HGVS. The implementor is responsible for the quality of their data.
Kevin Power (Oct 22 2019 at 13:53):
As we are wrapping up the publishing of the first release, we are working on what is next. If you would like these considered, please log trackers.
Larry Babb (Oct 30 2019 at 14:39):
I'm not sure that SPDI is more than another attribute/id (useful possibly, but there will be others going forward too) and it I'm not sure I want to enable the continued illusion that HGVS is a sound computable identifier by bolstering the constraints on how it is validated. I'm going to hold out hope for the effort to planned around combining the IM work with the CG IG spec to help bolster the overall approach.
Bob Dolin (Oct 30 2019 at 15:11):
Hi @Larry Babb , I've struggled with HGVS too. One of the things I really like about SPDI is that it has the notion of a canonical form and a nice set of services that let you convert variants in other formats into SPDI canonical form. (https://api.ncbi.nlm.nih.gov/variation/v0/)
Larry Babb (Oct 30 2019 at 17:55):
I agree @Bob Dolin I'm thinking that the GA4GH VR 1.0 digest serves the same purpose since it incorporates the SPDI algorithm for normalizing. But we could add both I suppose, seems like there's no limit to the number of attributes/identifiers/etc... that we can add to a variant. ;)
Last updated: Apr 12 2022 at 19:14 UTC