Stream: genomics/committers
Topic: First draft of updated glossary
Kevin Power (Jan 31 2022 at 23:52):
Hey all (especially @Anand Kulanthaivel ) - I have made the first draft of the updated glossary, please review here:
http://build.fhir.org/ig/HL7/genomics-reporting/branches/master/Glossary.html
Anand Kulanthaivel (Feb 01 2022 at 13:59):
Kevin Power said:
Hey all (especially Anand Kulanthaivel ) - I have made the first draft of the updated glossary, please review here:
http://build.fhir.org/ig/HL7/genomics-reporting/branches/master/Glossary.html
Thanks for posting this, Kevin! By the way, do we usually include sources in the glossary (e.g., as footnotes after each definition)?
Kevin Power (Feb 01 2022 at 14:27):
I did not - should we?
Kevin Power (Feb 01 2022 at 15:10):
And I should add - we did not in the first version either.
Kevin Power (Feb 01 2022 at 21:54):
I am considering the glossary change 'Applied' but welcome any feedback if anyone has anything.
Jamie Jones (Feb 04 2022 at 22:04):
Did we vote to change molecular_consequence to SO 1580 (http://www.sequenceontology.org/browser/current_svn/term/SO:0001580) from SO 1537? Much smaller sub-concept
Kevin Power (Feb 04 2022 at 22:20):
Hmmm, nothing I could find in JIRA after a few searches.
Bret H (Feb 05 2022 at 17:32):
@Jamie Jones I would want to discuss that before that change was made. Those are way different SO concepts.
Jamie Jones (Feb 05 2022 at 18:14):
Yes I was just wondering how the new term came up in the glossary. In my branch I updated the glossary to use the concept we had established before and did not change any structures.
Bret H (Feb 16 2022 at 13:54):
Did the glossary and component strings get updated in build? I think what see right now is a switch of Coding-change-type and molecular-consequence in terms of the text.
Kevin Power (Feb 16 2022 at 14:21):
Do you mean in the Glossary, or the components, or both?
Bret H (Feb 16 2022 at 15:57):
both. I thought I saw that the SO terms and definitions had been brought into alignment.
Bret H (Feb 16 2022 at 15:58):
Back when, I think it was Patrick that suggested those components.
Kevin Power (Feb 16 2022 at 16:33):
Hmm, perhaps we need @Jamie Jones and @Anand Kulanthaivel to weigh in?
Bret H (Feb 16 2022 at 16:48):
Yeah. Maybe I just need to sit down with the two of them and go through the SO terms and definitions to fix my confusion.
Anand Kulanthaivel (Feb 16 2022 at 16:55):
I clicked outside the message and Zulip ate it. Let me re-write what I was thinking.
Bret H (Feb 16 2022 at 16:56):
Much appreciated @Anand Kulanthaivel
Anand Kulanthaivel (Feb 16 2022 at 16:59):
(Saved to notepad this time!) - @Bret H @Kevin Power @Jamie Jones --
Molecular consequence semantically extends amino acid change type into all possible transcript consequences. These include things such as fusions, amplifications, and UTR variants, that don't always translate (pun intended?) into amino acid change/types. In coding SNVs, however, the two are generally semantically redundant. The ambiguity is not well explained in our documentation and it wouldn't hurt to make this a future meeting discussion point. I've attached screenshots of the two value sets for you to compare.
aachgtype_vs_loinc.PNG molconseq_vs_so.PNG
Bret H (Feb 16 2022 at 17:14):
we had a functional consequence field at one point, I thought, with values from SO:0001536 functional effect variant in addition to one with codes that could be from SO:0001537 structural effect variant. Maybe they got merged somehow?
The current molecular consequence value set is more like the functional effect element. (which I think Jaime alluded to earlier). SO:0001536 functional effect variant
I had not noticed till now but the current 'coding-change-type' component name, with it's position in the listing of components, could lead someone to think about protein coding sequences. But it is clearly by SO value set and definition about a coding of the type of change. I'd almost suggest SO:0001537 instead of SO:0002072...well maybe we should make this change too. I think it is in the spirit of the desired element and the value set does a better job of representing the concept of change type (even if it does use the word structural in a potentially confusing way that is conflated with other terms).
I think there might simply be a typo in the SO term used for molecular consequence - i.e. '37' was used instead of '36' (apologies for not looking at the element earlier Jamie)
@Anand Kulanthaivel @Jamie Jones
Bret H (Feb 16 2022 at 17:33):
https://chat.fhir.org/#narrow/stream/179197-genomics/topic/DNA.20change.20type
Anand Kulanthaivel (Feb 16 2022 at 17:34):
Functional effect (SO:0001536) is the predicted change in biochemical function attributable to a variant. Unlike molecular consequence or amino acid change type, it cannot easily be calculated (requires either wet lab experiments or very advanced predictive modeling).
Bret H (Feb 16 2022 at 17:42):
Let's go with the current definition of molecular consequence in the IG. 'A calculated classification of the effect of the gene's sequence change on the resulting amino acid (protein) sequence change.' How is the structural value set for that any better?
And there is evidence for the predicted (and sometimes known) function attributable to a variant. So it would make sense. It is calculable to pull in the annotation that others have defined and a predication is also calculated. You do not need a wet-lab experiment or predicative modeling once the association is established.
Bret H (Feb 16 2022 at 17:44):
I see why one wants to use it but the 0001537 has so many values that go beyond describing a change to a transcript that it seems really odd to use it and then define the element to only be about 'the gene's sequence change related to resulting amino acid sequence.'
Jamie Jones (Feb 16 2022 at 17:45):
glossary looks right to me based on my understanding. molecular consequence (SO 1537) is calculated just from the position and is included on variant as an optional annotation. coding-change-type (not in glossary, but implemented by mCODE) uses SO:0002072 = "sequence comparison", which notably allows for "no_sequence_alteration", which is not possible using the structural variant codes, which are scoped differently.
Functional effect requires evidence so is NOT defined on variant, but on Dx Imp
Bret H (Feb 16 2022 at 17:47):
coding-change-type is in our Variant profile @Jamie Jones why is it not in the glossary? it is also mentioned in the variant guidance
Bret H (Feb 16 2022 at 17:54):
@Anand Kulanthaivel I'm happy regards SO:0001536 being where evidence can be support it (lovely point you raised btw). From glossary.
Functional Effect A predicted or observed effect of a variant on its gene's (or protein product thereof) ability to function. Value set enumeration is to be the children of Sequence Ontology SO:0001536.
Bret H (Feb 16 2022 at 17:57):
But that leaves us with the current expanse of 0001537 on Molecular consequence with the tight definition of A calculated classification of the effect of the gene's sequence change on the resulting amino acid (protein) sequence change.' @Anand Kulanthaivel @Jamie Jones can we modify the value set to be a union of slices from under 0001537? or leave stuff in that is not related to the 'resulting amino acid (protein) sequence change' ....I don't know but it is nice to declare a triplet repeat expansion, and it would be represented in the transcript and have amino acid consequences, but seems weird somehow.
Bret H (Feb 16 2022 at 18:02):
Do you think it would help others to move 'coding-change-type'? current order image.png
Bret H (Feb 16 2022 at 18:02):
the order of component slices in our IG is malleable without changing meaning, right?
Bret H (Feb 16 2022 at 18:03):
Maybe switch places of coding-change-type and molecular-consequence, at least? @Jamie Jones @Anand Kulanthaivel
Bret H (Feb 16 2022 at 18:14):
but something has to be done with molecular-consequence. Either expand the definition to allow for more than resulting transcript/protein sequence changes to be communicated or make the value set align. SO:0001628 (SOWiki) Intergenic variant is a really odd thing to have with the current definition...I don't see any harm in removing the limit to 'resulting amino acid (protein) sequence change' to have the definition be
'calculated classification of the effect of the gene's sequence change'
Bret H (Feb 16 2022 at 18:15):
what do you think? @Anand Kulanthaivel @Jamie Jones
Bret H (Feb 16 2022 at 18:31):
Jamie Jones said:
Did we vote to change molecular_consequence to SO 1580 (http://www.sequenceontology.org/browser/current_svn/term/SO:0001580) from SO 1537? Much smaller sub-concept
would be consistent with the current definition and remove terms that could be construed as redundant with DNA change type (coding-change-type).
moving the components for clarity just seems like a good idea.
Bret H (Feb 16 2022 at 18:40):
to me it's a QA issue to make the value set and definition consistent by ensuring the best value set representation.
moving the components is an editorial change for clarity.
Kevin Power (Feb 16 2022 at 22:33):
Maybe we should log a technical correction JIRA to capture this change, then allow @Jamie Jones and @Anand Kulanthaivel to comment there if they agree or not?
Bret H (Feb 16 2022 at 22:41):
https://jira.hl7.org/browse/FHIR-36047 @Anand Kulanthaivel @Jamie Jones please comment and note if the value set constraint or the definition change is preferred for molecular consequence. Many thanks!
Kevin Power (Feb 17 2022 at 14:10):
@Anand Kulanthaivel @Jamie Jones - Will you all have a chance today to review the JIRA?
Bret H (Feb 17 2022 at 15:48):
Thanks! @Anand Kulanthaivel "Narrow the existing enumeration to include only all children of all children of SO:0001878 (feature variant), which is the great x2-grandparent of SO:0001580 and a child of my original proposal (SO:0001537, structural variant)."
and
"Define molecular consequence as "The calculated or observed effect of a variant on its downstream transcript and, if applicable, ensuing protein sequence". The gist I'm getting from the NCBI is that "Molecular" = Transcript + Protein."
Nicely commented on in the JIRA - The evidence will be helpful for future folks (or likely us as we forget over time ; ^ )
Bret H (Feb 17 2022 at 15:51):
I'll get the changes in later today.
Anand Kulanthaivel (Feb 17 2022 at 15:53):
I'm glad that you found it useful @Bret H ! Consider waiting for @Jamie Jones before any changes are uploaded (your discretion).
Bret H (Feb 17 2022 at 15:54):
I'll wait a bit. I think your proposal will help folks not be confused a lot.
Kevin Power (Feb 17 2022 at 15:55):
If the two of you are happy, I would say feel free to push up the change and then @Jamie Jones can comment. If he disagrees we can change or back it out if needed.
Jamie Jones (Feb 17 2022 at 16:06):
Looking into our original notes for these terms. A lot of thought went into the proposals we had.
Jamie Jones (Feb 17 2022 at 17:51):
Commented on the JIRA with the 23 terms we lose if we restrict to feature_variant from structural_variant. We still have 131 unique terms in the scope
Jamie Jones (Feb 17 2022 at 18:53):
Noting that some of the terms removed are in the valueset used by VEP: https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html (which is a strict subset of 1537 wherein every term has a known use), I would suggest we consider creating the curated VEP list in FHIR and use that extensibly.
Kevin Power (Feb 17 2022 at 19:57):
@Bret H @Anand Kulanthaivel - comments about Jamie's notes?
Anand Kulanthaivel (Feb 17 2022 at 20:22):
The VEP consequence list appears to be missing some of the children of SO:0001537 (and some of the items from the NCBI's suggestions). Perhaps we should expand it back to where it was before, to children of SO:0001537. This ensures that all NCBI and VEP terms (and more beyond that) can be used. In addition, I do not see any of the children of SO:0001537 being nonsensical (which would push the case of being more selective, but all children appear relevant).
Jamie Jones (Feb 17 2022 at 20:38):
I can compile a list of SO terms mentioned by VEP, snpEff, and ClinVar here. Interestingly, CinVar and snpEff go outside the scope of 1537 sometimes
Bret H (Feb 17 2022 at 20:41):
Put the list here please. Not all those terms will fit the molecular-consequence definition of "The calculated or observed effect of a variant on its downstream transcript and, if applicable, ensuing protein sequence"
Also, some of the terms marked missing in the JIRA are encompassed by others. E.g. transcript fusion versus gene fusion.
Bret H (Feb 17 2022 at 20:43):
We also have the coding-change-type (DNA change) and functional components that capture things from that list.
Bret H (Feb 17 2022 at 20:44):
'intergenic variant' for example would not fit the current definition of molecular consequence.
Jamie Jones (Feb 17 2022 at 20:45):
I'm not sold that the current definition is the intended scope :)
Jamie Jones (Feb 17 2022 at 20:46):
The original request was "give us a home for SO term annotations we are already using"
Jamie Jones (Feb 17 2022 at 20:55):
We pursued splitting Annotations up between those easily calculated just from the specific position and change (which became molecular consequence, using representative transcript) vs those that require experiment or computational evidence (functional effect is NOT on variant but part of a separate observation about the variant)
Bret H (Feb 17 2022 at 20:56):
yep.....might coding-change-type be rolled into molecular consequence? I think we should retain functional-effect as it is now (Anand gave good reasoning about needing evidence for those). The stuff in coding-change-type and molecular consequence are more like 'this is what I see in the observed sequence' Whereas, 'NMD_triggering_variant' and 'altered gene product level' are implications unless specifically observed (in which we'd have an observation like 'NMD seen' or 'high transcript level').
with a broadened def then might coding-change-type be rolled into molecular consequence? Clarity is important.
Bret H (Feb 17 2022 at 20:56):
bring on your list. we have extensible as the binding, btw.
Jamie Jones (Feb 17 2022 at 21:00):
63 terms here along with some other notes: https://docs.google.com/spreadsheets/d/1RXVIMNk05GDEAS9FxZDa2x1DlJhfBM6imkPI8mlDE_A/edit#gid=930906682
Jamie Jones (Feb 17 2022 at 21:03):
departures from 1537 are minor, sometimes helpful to note a sequence_feature(region) that is relevant, like http://www.sequenceontology.org/browser/current_svn/term/SO:0001093 or state no_sequence_alteration (this concept is in coding-change-type)
Bret H (Feb 17 2022 at 21:06):
and there are terms with slight differences like short_tandem_repeat_change and short_tandem_repeat_variation, which can be argued to slightly different concepts. However the effect on the genome is the same. Have you looked at 1536 in detail too in comparison to ---
Bret H (Feb 17 2022 at 21:07):
wrong term
Bret H (Feb 17 2022 at 21:08):
I meant SO:0002072 sequence comparison
Anand Kulanthaivel (Feb 17 2022 at 21:18):
Question - "coding change type", is that "DNA change type"? In which case it's upstream of molecular consequence and not the same thing. I looked at SO:0002072 and it describes the DNA change type. sequence_alteration appears to be independent of transcript context.
Bret H (Feb 18 2022 at 14:10):
Yes. I was getting mixed up on the 3 parent terms. we'd depart from NCBI by inclusion which is not too different from what was done with implications to merge several components. The names of the SO term (structural variant versus sequence comparison is another source of confusion). At any rate, Jaime's pointed out several terms to consider including, compared across a couple currently used annotations. Anand suggest a change in the definition to be explicit about transcripts and proteins. back to the JIRA : ^ )
Bret H (Feb 18 2022 at 14:23):
Here's the JIRA link https://jira.hl7.org/browse/FHIR-36047 @Anand Kulanthaivel @Jamie Jones I suggested a compromise.
Bret H (Feb 22 2022 at 17:14):
just checking we good with the following for clarity:
1) Define molecular-consequence as "The calculated or observed effect of a variant on its downstream transcript and, if applicable, ensuing protein sequence" - don't touch value set for now (it will still be confusing for anyone looking at the coding-change-type component and molecular-consequence component.
and
2) move the coding-change-type component so it appears in a different place than the amino acid and transcript components
3) the amino-acid change type won't be removed even-though it overlaps with molecular-consequence's definition "The calculated or observed effect of a variant on its downstream transcript and, if applicable, ensuing protein sequence"
maybe we come back to 3 post-R5.
is that the final conclusion?
Bret H (Feb 22 2022 at 20:51):
@Anand Kulanthaivel @Jamie Jones I'll push the changes tonight (1 and 2 above).
Kevin Power (Feb 23 2022 at 14:24):
@Bret H - Any updates on making this change?
Bret H (Feb 23 2022 at 16:13):
Yep. haven't heard a nope. so about to push. will post commit here
Bret H (Feb 23 2022 at 16:21):
updating publisher just in case.
Jamie Jones (Feb 23 2022 at 16:22):
The amino acid change type field is a bit of a hanging chad but I don't want to push another vote on it or remove it without one since it was implemented by mCODE
Bret H (Feb 23 2022 at 16:23):
Right. Agreed. Consolidating will need to wait unfortunately. At least there's a more recent Zulip discussion and JIRA discussion on them, for the future or would be implementers.
Bret H (Feb 23 2022 at 16:26):
fyi there were 12 warnings before the changes and these warnings remain. should I try to address them? They're not really part of the JIRA, don't impeded the build and I understand we'll have more changes later this week.
Jamie Jones (Feb 23 2022 at 16:27):
For the record, the value set is differently framed, it matches pretty closely with dna-change-type, which fits well in SO in the context of DNA changes but there's no parallel that fits well for predicted downstream effects. I would suggest against using amino acid change type unless explicitly needed for the implementation.
Bret H (Feb 23 2022 at 16:42):
Changes are here: https://github.com/HL7/genomics-reporting/commit/5bdbd2cc8787e339fed39d40ff8369875de8c1ed
and a period added here: https://github.com/HL7/genomics-reporting/commit/0075a157ffcbdf88bdc4c45d97f8683fc07e42da and
Kevin Power (Feb 23 2022 at 16:47):
Thanks @Bret H !
Last updated: Apr 12 2022 at 19:14 UTC