Stream: genomics
Topic: obsVariant transcript/aa components
Larry Babb (Sep 17 2019 at 13:30):
The ObsVariant profile has 0..1 components for transcripts and amino acids as follows:
transcript related components
component:transcript-ref-seq
component:dna-chg (only cDNA-c., not n. or r. RNA hgvs based on LOINC code)
component:dna-chg-type
amino acid related components
component:amino-acid-chg
component:amino-acid-chg-type
While this information is useful to share, it has the following limitations...
They are constrained to 0..1, so if a sender wanted to share multiple transcript forms of the variant they could not. Are we presuming this would never be a reasonable use?
If multiple transcripts from a single gene or from multiple genes (in the case the genomic change impacts multiple genes on the same or reverse strands, then we would need to be able to capture the corresponding dna-change and it's dna-change-type derivative data with the corresponding transcript.
(From the transcript and amino acid variants topic we also have no way of sharing the transcript ref-seq relative start-end, ref and alt values if desired, in case an underlying genomic representation is not readily available.)
All the same issues exist for amino acids as they are directly related to the various forms of transcripts from which they are translated.
The additional missing item for amino acid attributes is that there's no component for the amino-acid-ref-seq.
Bret H (Sep 17 2019 at 13:38):
@Larry Babb you would want the correct amino acid type to be associated with the correct transcript referenced, right? So you are saying that within the Type it would be important to add a transcript sequence. @Patrick Werner @Kevin Power @James Jones For others: consider alternative splicing. The variant's effect on amino-acid level changes would be different depending on the codon usage of each alternative splice form. So, if we change cardinality to 0...* and add a reference sub-component, we would provide an opporunity to comment on mutliple splice forms in the same profile instance. The other point is that aa change type should be a subcomponent of amino acid change. One would be hard pressed to find a good reason in a use case to provide the aa change type without the aa change.
Bret H (Sep 17 2019 at 13:40):
oops. Given the last sentence here's the structure: AA change as top-level component, with aa chang type andand aa chg reference transcript inside the aa change component at the same level. Unfortunately, this moves the change type more deeply in the structure (more nested).
Jamie Jones (Sep 17 2019 at 13:43):
Components can't easily be structurally linked beyond being defined on the same profile to my knowledge. For concepts that need multiple components, we need to either provide textual guidance saying to use them together or to define invariants to throw a warning if you fill out one component without filling out another required component.
Jamie Jones (Sep 17 2019 at 13:44):
extensions can do it
Larry Babb (Sep 17 2019 at 13:45):
@Bret H to be clear, I'm hopeful that this helps continue to highlight the shortcomings of the component (bag) approach being used. If we could separate out the "definition" of the variant from the annotations and derivative data, then we may be able to identify a path forward that simplifies the approach overall.
Bret H (Sep 17 2019 at 13:46):
yep. One could use our IG as it is but need to use multiple instances for several transcript references, so perhaps that is best. @Larry Babb if we don't add subcomponents then you can use seperate instances of obs-variant to describe each amino acid reference
Larry Babb (Sep 17 2019 at 13:47):
the textual guidance is so overwhelming complex that we loose the aim of developing a reasonable implementable standard.
Larry Babb (Sep 17 2019 at 13:48):
obsVariant is a very heavy object, to simply say we will create multiple instances to link up to other instances creates new barriers for adoption.
Bret H (Sep 17 2019 at 13:48):
Again.were talking about inferred amino acid changes based on DNA or rarely RNA sequence. If we created an amino acid, RNA specific profiles one would need to make sure that guidance indicated that it is implied that inferences on other molecule types go into the approriate parent moleculr type.
Bret H (Sep 17 2019 at 13:49):
@Larry Babb how is it heavier than an additional profile for this use case? Can you help me understand what you mean by heavy?
Larry Babb (Sep 17 2019 at 14:02):
i'm not suggesting new profiles (yet). I'm suggesting some baseline data type like "allele" {sequence+location, state at that location(alt)} eventually building out some other computationally fundamental variation structures (genotype, haplotype, variation sets, cnvs, translocations, etc). These types could be used in profiles and resources to provide the specific set of annotations that a given lab might want to send, but would mostly be optional.
I don't think creating new profiles is the answer to the heaviness issue.
To be clear, I am trying to demonstrate how to build the obsVariant, obsMedImplications for 6 PGx diplotypes being reported on the eMERGE standard results. I took a pretty simply single PGx diplotype TPMT which is asserted to be a certain star allele based diploytpe based on the combination of 4 genotype SNPs. In order to build the 4 genotype SNPs as obsVariants, I need to represent each allele for each SNP genotype. So that's a max of 8 allele representations that can be used to define the genotypes. For example TPMT:g.18130918T>C(c.719A>G / p.Tyr240Cys) is derivedFrom (or hasMembers of) g.18130918 with a T and g.18130918 with a C if heterozygous, or simply two g.18130918 with a state of C if homozygous. So now I need to create obsVariant (or molecularSequence) instances of these low level alleles so that we can accurately show what is the basis of one of the four SNP genotypes that were observed and thus used to assert which star allele diplotype that the lab chose to report for that patient. Then we can associate that to the Med Implication of "normal metabolizer" (or whatever it is).
That is for one of the six PGx diploytpes (and that's a fairly simple one).
so it is daunting to try to use these structures (from both a volume standpoint and from an understanding of how they are meant to be used by the designers). There's lots of moving parts, lots of options, no useful examples and a large hurdle to climb to simply convey these critical bits of data in a manner that will ultimately be used by downstream systems to understand precisely what the lab observed in a repeatable and reliable way.
Bret H (Sep 17 2019 at 14:05):
good feedback. have you had a chance to look at http://build.fhir.org/ig/HL7/genomics-reporting/compound-heterozygote.html
Bret H (Sep 17 2019 at 14:09):
copying from other thread as it is relevant to discussion here
"
FROM: Larry Babb 8:07 AM
In the VICC they often get folks reporting amino acid forms of variation, on which they attach evidence, annotations and knowledge for sharing with labs that may or may not use it to report on results and findings for somatic testing.
My understanding is that while they may not actually sequence amino acids, the data and knowledge in these variant level databases may only be reported based on the amino acid sequence. If so, then wouldn't HL7 need to provide a means for representing these forms of variation?
Some reporting labs may choose to only report back the amino acid variation form. Without the basis of the amino acid sequence and its start/stop positions, one would have to rely solely on the p. HGVS expression (I presume). How would downstream systems and clinical research efforts ever reliably find and understand what that data is without a structured computational form that is consistent and alignable to some reference?
"
Larry Babb (Sep 17 2019 at 14:09):
yes i have. It's a non-starter for me to use HGVS as a basis for variation (at least not as a computational form). I'm fine with sharing the hgvs to help humans understand the data, but it is completely derivative data IMO).
Larry Babb (Sep 17 2019 at 14:11):
I'm sorry if I'm coming across as super difficult. I'm currently a bit frustrated and feel as if I've hit a wall in my eMERGE project.
Kevin Power (Sep 17 2019 at 15:17):
@Larry Babb -- You have talked about several things in these threads, and it seems we need a way to capture additional details about the amino acid sequence (multiple transcript references, starts/stops). Doing those in components while understanding how they are all related would be at best challenging, but more likely impossible / unusable. It seems like MolecularSequence can support some of those details ("allele" {sequence+location, state at that location(alt)}), and from other conversations, you are tried to use it. If you can share, that might help us see the specifics of what you are trying to do - and help me make recommendations and/or identify some changes we should make to better support these use cases.
Not a good answer I'm afraid, but until we can get our common information model published, then figure out how our IG should align to that model, we will likely only make incremental improvements to the IG. I would hate to see us completely overhaul the IG until we get that model, because we will likely have to make some significant changes once it is available. To be completely clear, that is my opinion only, not a work group decision.
Bret H (Sep 17 2019 at 15:18):
@Kevin Power to clarify one can capture " amino acid sequence (multiple transcript references, starts/stops)" with our current obs variant, but it would require one instance per transcript reference.
Kevin Power (Sep 17 2019 at 15:19):
Good clarification @Bret H
Kevin Power (Sep 17 2019 at 15:51):
When we starting this version of the IG, we contemplated a separate profile to AA changes, but we didn't see the need. We were thinking of it as additional annotations of derived information from a DNA change. But perhaps this discussion makes a case to revisit that.
Larry Babb (Sep 17 2019 at 17:59):
@Kevin Power thanks for the feedback and clarification of where things stands (even though it may not be the official WG statement). I completely understand the position and agree that the common information model approach. I'm assuming that is the model that @Bob Freimuth is working on the Thu weekly calls. I recognize that there is going to be quite a bit of lag between now and when that starts to inform a focused effort on developing FHIR types, resources and profiles for the version I'm seeking. So, I will try to pick a lane for now and stay in it, so that I can share and do my best to help the near term efforts of the WG.
My conundrum is mostly around whether I should dedicate the use of some subset of the MolecularSequence resource to being an "allele" definitional structure (keeping it detached from a specific specimen or patient concept). Or, should I "ignore" MolecularSequence and try to use a version of the obsVariant profile to represent the atomic structure I need to build out better representations of the genomic data.
I'm currently playing with the MolecularSequence and finding that if I restrict myself to use the following attributes, I can make somewhat minimal forms of the "allele" concept I need. Then I think the only way I can stitch those together to indicate the phasing (in trans) is with the obsSeqPhaseReltn profile. So, I'm exploring that as well. Once I've constructed a computationally complete representation of the genotype SNP (e.g NC_000006.11:g.18130918T= in trans with NC_000006.11:g.18130918T=) to get a homozygous NC_000006.11:g.18130918T= genotype, then I use that as the derivedFrom target on an obsVariant that ultimately says that it was present and in what patient/specimen and all the other additional/optional types of annotations that are available (dna-change, dna-change-type, allele-freq, etc...). This is a 3 tier nesting, but the variant definitional data lives at the bottom two layers. I think I can stitch together a number of different types of genomic variant structures, including CNVs.
The ObsSeqPhaseReltn end up being part of the variant definition and thus I don't treat it as a patient-specific observation. Its not until it gets linked up with the ObsVariant that I start to consider it a patient-specific finding. I think it is interesting that obsVariant is really about the absence or presence of a variant. It really isnt' about "what is the variant". There in lies the confusion with using obsVariant (or Observation profiles in general). The component approach to observations is "handy" for providing a bunch of parts that make a whole, but if the observation is about the presence or absence of something, then it is a bit of a conceptual stretch to consider the components part of the thing that is present or absent. IMO, components should clarify the parts that make up the actual value (presence or absence of something in this case) that is specified by the observation, not the indirect relationship to the thing that the value references.
Even if you argue that the component can be the parts of a thing that the value references and not the parts of the value itself, then I would say that the component approach should not be used for things that are critical concepts that should be standardized for use in a broad array of use cases (like Variation). At best I would think that folks using a profile should be able to look at the entire set of components for an observation and reasonably identify the "one" thing that all the component parts contribute to defining. In our case, that could be a transcript, amino acid or genomic allele or genotype call for a given allele-frequency and genomic source. While it suits lots of needs, it really doesn't suit one.
Kevin Power (Sep 17 2019 at 18:54):
Based on what you are trying to represent, I would say MolecularSequence is probably a better fit for the definitional structures than is ObsVariant.
Regarding the usage of components, here is the section in Observation which describes 'grouping' options:
http://build.fhir.org/observation.html#obsgrouping
I think this part makes the best case for our usage of Components:
Observations that are commonly produced and interpreted together.
So in the case of ObsVariant, we are saying 'we found a variant' in .value(), and then providing the annotations that were 'produced and interpreted together' as components values. We wondered about that one thing that we are defining for a while, but couldn't really come up with the one that fit all use cases. I think the closest thing we came up with was the HGVS string, but that of course brought up concerns.
Bret H (Sep 17 2019 at 20:22):
@Kevin Power I am confused. Obsvariant has components for transcript id, amino acid change and amino acid change type. Not sure why you are mentioning HGVS here? can you explain please
Kevin Power (Sep 17 2019 at 20:33):
@Bret H I was trying to talk to @Larry Babb 's question about identifying the "one" thing that all components contribute to defining - so if we said the ObsVariant.value() should be "what is the ONE thing that was observed", and have the ObsVariant.component[]'s contribute to defining it, I was saying the HGVS string is as close to ONE thing that might described what was observed.
Perhaps I missed the point (or at least part of it), but that was my thinking behind the answer.
Larry Babb (Sep 18 2019 at 13:31):
@Kevin Power thank you for pointing out that section of the spec and the background on how "Present/Absent" was the defining value of the "we did/didn't find a variant" observation came to be the obsVariant profile. I know how frustrating it can be to bring others along that were not participating in all the many hours of discussion that led to these spec design decisions.
Wouldn't it be nice if we could get to the point where we had a resource that let us represent the variant (independent of testing methods - no read depth, allele freq, etc...) and we could use that to represent whether the variant in question existed (or didn't exist) in the patient's germline genome or some tumor or bodysite as part of some somatic genome. Sort of like how Conditions are captured for a patient in an EHR - and maybe even AllergyIntolerance's. I think the EHRs should be thinking about how to store the clinically significant variant findings for either germline (dx or pgx) and or somatic (based on tumor or tissue bodysite). They should not necessarily be concerned (at this time) with capturing everything that may spew out of a whole genome or whole exome sequencing test - just those genomic variant structures that were earmarked by a genetics lab as "important" and "informative" to the patient's care. If we had such a Resource then we could focus on providing value in returning these results.
Right now the components in ObsVariant are several things. Fundamentally (I think) they are suppose to be a set of fields that define the variant (genomic-ref-seq, genomic-start-stop/inner/outer, genomic-alt, genomic-source, allelic state, copy-number) that is being observed as present or absent. That is the crux of the issue we should be focused on - this is what the GA4GH VR group is seeking and we are getting close to having something that is broad enough to support most use cases. This fundamental model would require the patient and specimen values to always exist as it would be important for the specimen to identify the tumor tissue or anatomical location in which the variant was found (or not found).
The other component values begin to create a kind of chaos when using obsVariant. Read-depth, IMO, is another kind of observation. It is not inherently relevant in all cases to the identity of the variant that is being observed as found or not found. I think it is important and could be a secondary observation that references this primary observation. Same goes for allele frequency, it provides "additional" value that warrants its own observation IMO - again it could hang off of this primary observation. Transcript and amino acid derivative representations of the observed variant are also separate observations (really annotations) that are not part of the "identity" of the observed variant that has been found/not found. dbSNPId is nice but again, it is an ancillary annotation, super helpful in may context, great to capture, but not a required part of the identity of the variant - at best it is helpful. There are all kinds of annotations like this that folks may or may not want to capture, share and store in EHRs. It is a slippery slope that we have been sliding on for some time. Maybe we can consider resetting the table a little and define the fundamental "I found or didn't find THIS variant" observation and then demonstrate and provide guidance for how other derivative and annotated related values could additionally be shared using "derivedFrom" or possibly "hasMember" relationships.
Just an idea. I'm trying to find something solid to focus on in order to use and share the intent of the CG IG. Clarity on this would be very helpful and could open the door to seeing how a first class "specialized" resource might evolve which won't be too far afield from where we start.
Larry Babb (Sep 18 2019 at 13:38):
I do think if we can reduce the component list to only those attributes needed to represent the variant that is present or absent, then we can focus on the various component fields and how they work together to construct the different types of genomic structures needed by geneticists to link clinical significance in terms of diagnostic, prognostic, therapeutic and whatever type of useful condition information they discover.
Larry Babb (Sep 18 2019 at 14:01):
Just an update to my last post in this thread (for anyone that is watching). Now that we have explored the use of MolecularSequence to provide definitional representation of the components needed to represent alleles, we're considering bailing on this approach and retracting the scope of what we can do with the CG IG and available structures in FHIR around variation.
We are considering foregoing the aim of providing variant-level knowledge, which we currently support in our custom XML structures that have been in use in the eMERGE systems and informing the downstream folks that have been asking for a FHIR version of this structure that some of the capabilities will NOT be able to be accomplished in a reasonable way.
Our determination is that we could build a number of custom profiles and extensions and get it to work, but the lift is too great at this point and it would likely deviate significantly from the current course that the CG IG is on.
Kevin Power (Sep 18 2019 at 14:19):
Maybe we can consider resetting the table a little and define the fundamental "I found or didn't find THIS variant" observation and then demonstrate and provide guidance for how other derivative and annotated related values could additionally be shared using "derivedFrom" or possibly "hasMember" relationships.
We did consider this for a while, but we then bump up against the guidance that an observation needs to provide value by itself. It is a good exercise to ask "if this is an observation, and someone queried the system and it was the only observation that was returned, would it provide any value?" Expressing something like the read-depth in its own observation would not be helpful without context. There are likely other profiles we could create (maybe the AA info) that could meet all the criteria we are talking about, but to date our thought as been let's get this "grab bag" into real world scenarios and see how well it works - then adjust as we go.
Kevin Power (Sep 18 2019 at 14:27):
We are considering foregoing the aim of providing variant-level knowledge, which we currently support in our custom XML structures that have been in use in the eMERGE systems and informing the downstream folks that have been asking for a FHIR version of this structure that some of the capabilities will NOT be able to be accomplished in a reasonable way.
Would you be willing to share an example of your custom XML structure for our community to review? Perhaps we get lucky and someone can make some better recommendations for you given our current state.
I am sure you have thought of it, but you can always deliver your current XML as part of the bundle, likely as a DocumentReference.
Larry Babb (Sep 18 2019 at 19:25):
I've definitely shared the emerge XML in the past with the group. Its now been around for nearly 2 years. Here's the repo where the emerge providers have set up a schema for the xml and I've stored some example report and corresponding xml for your review.
https://github.com/emerge-ehri/results-schema
Kevin Power (Sep 18 2019 at 22:05):
Many thanks @Larry Babb (sorry for the duplicate ask :slight_smile:)
Kevin Power (Sep 18 2019 at 22:50):
@Larry Babb - Quick question. If I were processing that XML, and wanted to look up the ReferenceVariant for a ReportVariant, do you 'join' by the <externalId> ?
Bret H (Sep 19 2019 at 11:49):
I've definitely shared the emerge XML in the past with the group. Its now been around for nearly 2 years. Here's the repo where the emerge providers have set up a schema for the xml and I've stored some example report and corresponding xml for your review.
https://github.com/emerge-ehri/results-schema
@Larry Babb Which file represents an example of how you put the information in HL7 FHIR?
Larry Babb (Sep 19 2019 at 13:02):
@Kevin Power ... yes, the externalId is the emerge network's internval variation id that allows all the systems to be able to link to the same variant regardless of representation, build, etc... This is the concept that GA4GH is promoting for the computed identifier on variations. This is the concept that ClinVar has for VariationId. This is the concept that gnomAD will soon be providing. This is the concept that ClinGen's allele registry uses (referred to as the CA-ID).
This is the concept that HL7 CG needs to embrace, support, promote and seriously recommend to sharing variants. Let's be done with dbsnp ids and hgvs. Let's get on board with a variant identifier plan that will actually remove all the craziness with sharing variant data. No scalable solutions will ever happen until then!
Thanks for setting me up with that softball question! ;)
Larry Babb (Sep 19 2019 at 13:09):
@Bret H We don't have a way of mapping the emerge XML to FHIR, it can't be done, at least not the critical part related to sharing variant representations. Much of it can. @Mullai Murugan and I have been working on this for some time. We have mapped most of the information. In some areas FHIR's CG IG actually is an improvement in structure (e.g. planDefinition).
Even though our original request was to map this XML (because folks are currently using it in downstream processes) to a FHIR standard, we realized early on, it was more pragmatic to simply try to figure out how to use the FHIR CG IG Genomics Report by walking through it resource-by-resource and attribute-by-attribute to see if we could figure out what everything is meant to do and what all the options are for sharing the data. This has been a long journey and we ultimately have made some choices along the way (which you guys are all aware of). We are at the end (we hope) and have concluded that we cannot support "ReferenceVariant" and the associated annotations and knowledge which is loosely coupled to the "ReportedVariant" structures. This separation of variant-level knowledge from case-level knowledge is the key to be able to support the CDS alerting in emerge and other systems when a previously reported variant is reassessed to end up with a different level of pathogenicity. (There are potentially many other CDS use cases, but this is a baseline one that seems to resonate well with geneticists and physicians alike.)
Bret H (Sep 19 2019 at 13:18):
@Larry Babb the place you are looking for Variant ID is here: http://build.fhir.org/ig/HL7/genomics-reporting/obs-variant-definitions.html#Observation.component:variation-code this is place to put the system (clinvar, emerge, etc and the identifier)
Larry Babb (Sep 19 2019 at 13:38):
yes, i've seen that. I wish it was a required field that folks would depend on as the primary means of identifying the variants. However, right now there is no one standard identifier approach. The GA4GH VR spec is aiming to correct this and provide a globally unique computable identifier based on the use of a standard schema for variation found here.
In the case of our specific emerge XML mapping, putting the externalId in that component of the observation does not help in bridging to the variant-level knowledge, because there's no way to make the variant itself the subject of an observation (or an assertion) which can hold the knowledge independent of the case. So, while I can provide the variant's externalId, it does not help that much with linking data. At best it will allow me to search through all case level reported findings to identify if any one else had the same variant. While this is a really great feature, it would require the standardization of identifiers. And, it only allows us to begin linking case level variation, not the knowledge itself.
Side point, but related. It seems like the dbSNPid should be included in this attribute and not made a separate component. I'm not quite sure why decisions to call out only certain attributes like dbSNPids was made. There are lots of ids that are as important and probably will be much more important going forward - related to variation. And finally there's the point that there has been a number of folks that recognize the hazards with providing dbsnp ids with reported variants, since they don't provide the precision needed to truly identify the variant (they are locus specific, not change specific). In HL7v2 we (I) requested many years ago to remove this identifier as a first class attribute because it was not clear whether it was representing the identity of the variant or not, and it continued to enable improper use by the masses. It was always the plan to remove it from that spec, and now here it is showing up in the new FHIR standard.
Why can't we make the obsVariant profile just about the variant that was observed as being present or absent and not about capturing all these "special" annotations and derivative forms? Once you start capturing these "special" attributes, you end up with a snapshot of what folks think are important today, and that will ultimately need to be garbage collected as the spec and community standards evolve over time. (sorry for constantly venting).
Larry Babb (Sep 19 2019 at 13:39):
Imagine if we had a Variant resource and that it could be the subject of an Observation? game changer!
Kevin Power (Sep 19 2019 at 14:08):
A few thoughts:
This is the concept that HL7 CG needs to embrace, support, promote and seriously recommend to sharing variants. Let's be done with dbsnp ids and hgvs. Let's get on board with a variant identifier plan that will actually remove all the craziness with sharing variant data. No scalable solutions will ever happen until then!
As Bret mentioned, we do support those sorts of variant "ids" from any of the sources you mentioned - or if we can't, we should. Are you suggesting we should pick one of those as the defacto standard and require it?
Also, we need to support dbsnp's and hgvs, because labs that are sending those today will want a way to send them.
We are at the end (we hope) and have concluded that we cannot support "ReferenceVariant" and the associated annotations and knowledge which is loosely coupled to the "ReportedVariant" structures.
If you don't mind, I would like us to consider the ReferenceVariant structure as input into our definitional requirements, is that OK?
It seems like the dbSNPid should be included in this attribute and not made a separate component. I'm not quite sure why decisions to call out only certain attributes like dbSNPids was made.
dbSNP was kept separate because it doesn't always identify a specific variant, but more the location - the group felt that it should not be treated the same as other 'variation-code' values.
Why can't we make the obsVariant profile just about the variant that was observed as being present or absent and not about capturing all these "special" annotations and derivative forms?
Are you suggesting that obsVariant should only include the attributes that you have on ReportedVariant? It does seem like we have a fair amount of overlap? For those that haven't reviewed, here is an example of ReportedVariant:
<externalId>60007705</externalId> <transcriptId>NM_000535.5</transcriptId> <geneRegion>Exon 4</geneRegion> <dnaChange>c.338C>A</dnaChange> <dnaChangeType>SUBSTITUTION</dnaChangeType> <aminoAcidChange>p.Ser113X</aminoAcidChange> <aminoAcidChangeType>NONSENSE</aminoAcidChangeType> <chromosome>7</chromosome> <geneSymbol>PMS2</geneSymbol> <significant>true</significant> <interrogatedButNotFound>false</interrogatedButNotFound> <forcedIncidental>false</forcedIncidental> <notInterpreted>false</notInterpreted> <alleleState> <codeSystem>LN</codeSystem> <codeText>Heterozygous</codeText> <code>LA6706-1</code> <valueSetAbbr>Heterozygous</valueSetAbbr> </alleleState> <categoryType>MEDICALLY_SIGNIFICANT</categoryType> <category> <codeSystem>LN</codeSystem> <codeText>Pathogenic</codeText> <code>LA6668-3</code> <valueSetAbbr>Pathogenic</valueSetAbbr> </category> <genomicSource> <codeSystem>LN</codeSystem> <codeText>Germline</codeText> <code>LA6683-2</code> <valueSetAbbr>Germline</valueSetAbbr> </genomicSource> <nestedVariants/>
Imagine if we had a Variant resource and that it could be the subject of an Observation? game changer!
New resources in FHIR are certainly possible, but are often met with questions like "do you really need this?" or "can't you use resource X?" - but we need to come up with answers to the questions this conversation brings up. I am hopeful the IM subgroup work we are doing can help inform us in this space.
Larry Babb (Sep 20 2019 at 19:02):
@Kevin Power thanks for the great feedback... addressing each below...
Are you suggesting we should pick one of those as the defacto standard and require it?
Maybe. The GA4GH VR spec defines the standard for assigning a globally unique computable identifier to variants (alleles for now, soon genotypes and haplotypes, and CNVs. ...) I'm suggesting that this identifier would be THE best choice in the long run, as it would then allow you to decorate it with all the various nomenclature and annotations regarding the current and future ways human's would want to present the variant. This id would be the key to a "referenceable" variant independent of the patient or sample. But I recognize this is a bit early, but not if you think about how long it will take to get this to Normative. By then folks will be asking "why didn't you include the GA4GH computable id - everyone else use it?" ;) ( i slightly jest)
If you don't mind, I would like us to consider the ReferenceVariant structure as input into our definitional requirements, is that OK?
Yes. I agree. But, the key here is that if you look in the ReferenceVariant you will see the schema and values that define the variant (and lift-over equivalents as well as projected equivalents to transcript and and proteins). These fields are currently in a flattened structure within the separate components within the obsVariant. They are somewhat better structured in the MolecularSequence. Regardless of where they are they should be consistently represented and separate enough that it is clear they work together to "define" the variant in a computational way. This is the minimum hope that I have for the profiles and FHIR. And it seems like you have the fields, just not organized in a coherent, clear and consistent manner.
dbSNP was kept separate because it doesn't always identify a specific variant, but more the location - the group felt that it should not be treated the same as other 'variation-code' values.
I see. I'm certainly that its not treated as a variation-code (but its not clear that folks won't still put it there). For the sake of being overly specific, why wouldn't we call it what it is - a location-type code or something to that effect and then allow the dbSnpId to be placed there along with any other "feature-based" representation of the variant position. It just seems like dedicating a whole component to one specific authorities identifier puts a significant emphasis on that identifier and presumes that it should be used for a long time. Maybe this is good. Maybe its not. It just might be nice to low-lite it a bit and give us an out in the future if we decide it isn't the only game in town for that kind of annotation. (And I'm a huge fan of dbsnp to be clear - i'm a bigger fan of a specification that supports growth).
Are you suggesting that obsVariant should only include the attributes that you have on ReportedVariant? It does seem like we have a fair amount of overlap?
We (GeneInsight/eMERGE/Clinical Labs that send structured genomic data) send this kind of data for convenience to the receiving systems. The key elements here are #1- "externalId" (that is the variant and only that! - refer to reference variant for precise definition). #2- "interrogatedButNotFound" (present or absent), #3 - genomic-source (allelic-origin - see GENO), #4 - allele state (see GENO) -only if the variant is an allele or haplotype - not needed for genotypes, complex variants, etc...
The other data is more for convenience and is a conflation of the assertion of pathogenicity with the actual presence/absence. Beyond that the transcript, aminoacid, genesymbols, etc.. are useful for the lab to structurally share how they prefer to represent the genomic change that they observed. And, to be clear, some labs will pick a different transcript for the same genomic change. You'd like to hope the receiving system and computational content would be able to understand that they are one and the same variant - at the genomic level which was what was actually observed.
New resources in FHIR are certainly possible, but are often met with questions like "do you really need this?" or "can't you use resource X?" - but we need to come up with answers to the questions this conversation brings up. I am hopeful the IM subgroup work we are doing can help inform us in this space.
I do get this. I honestly understand why the FHIR gods hold strong on this point, otherwise it would be bedlam in terms of Resources. It is really on us to make the case where we can all stand together and say "Yes - we really need this and here's why". It seems easy to answer the question "why can't you use resource X?". I believe this thread and the plethora of previous back and forth threads and emails have made the case "why Observation is insufficient for representing Variation in Clinical Genomics results". It is a big part of the solution - very critical, very important. It just falls short of giving us the structure for "Variation" as a first class thing (resource or data type). .... Like I said, if you share a Variation then you could make additional observations about it (was it observed, what was the gene, what was the frequency, what is its pathogenicity for a given disease).
When folks are making statements about the pathogenicity of a variant they are doing in the context of a disease (the indication or some secondary indication based on secondary findings). They are NOT making an observation that is directly related to the patient. It is indirect in that the patient has the variant and has the indication. The pathogenicity is a piece of information to inform the physician so they can diagnose the patient.
thanks for sharing and listening.
Last updated: Apr 12 2022 at 19:14 UTC