Stream: genomics
Topic: Standard terms from NCBI
Liz Amos (Jun 08 2020 at 16:10):
The reason Molecular Consequence was tricky to find was that it actually is part of the GTR (Genetic Test Registry) list of standard terms: https://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms/Molecular_consequence.txt. For a full list of standard terms used in the GTR: https://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms/
Jamie Jones (Jun 08 2020 at 16:12):
That is a much smaller subset of http://sequenceontology.org/browser/current_svn/term/SO:0001060 than I was expecting. Structural Variant has 243 subterms...
Patrick Werner (Jun 08 2020 at 16:13):
i did it via the advanced search: https://docs.google.com/spreadsheets/d/1xCn09zDblMpjIHWziZD2pAWT2zSNdABZklTRQB8HQGY/edit?usp=sharing
Patrick Werner (Jun 08 2020 at 16:14):
hmm my search seemed to return more concepts than the txt
Patrick Werner (Jun 08 2020 at 16:14):
but still not many terms
Patrick Werner (Jun 08 2020 at 16:15):
e.g. splice_site_variant (http://www.sequenceontology.org/browser/current_release/term/SO:0001629)
is important to us.
Patrick Werner (Jun 08 2020 at 16:16):
I still think the easiest would be using children of: 0001060. But providing an extensional VS containing all concepts listed.
Patrick Werner (Jun 08 2020 at 16:16):
hapi for example can't expand the VS we are currently using
Kevin Power (Jun 08 2020 at 16:54):
Yea, looks like the GTR list is not the same as the list defined in ClinVar? And other than the advanced search trick to see the list, I haven't found anything else in ClinVar that shows what they use? I did see a note where they say they compute this molecular consequence, so maybe they don't provide a nice human consumable list of values.
Liz Amos (Jun 08 2020 at 17:41):
ClinVar also allows free text responses, so I would imagine there are some that aren't the same as the "standard" list. Also, I would say it's a starting list (but you can see how many records are in ClinVar with the associated term by using the advanced search). Do we have to be exhaustive in the list of answers? Why not have examples with the option to use any SO code?
Kevin Power (Jun 08 2020 at 18:24):
OK, are we confusing ClinVar's 'Functional consequence' (seems to come from the submitter) and 'Molecular consequence' (is computed by ClinVar)?
https://www.ncbi.nlm.nih.gov/variation/docs/glossary/ (Search for 'Functional consequence')
Functional consequence is an observed effect of a sequence change on function. Ontologies such as VariO and Sequence Ontology (SO) are used to standardize terms, which are documented here: ftp://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms/functional_consequence.txt. As used by NCBI's resources, functional consequence is experimentally determined, in contrast to molecular consequence, which is computed from sequence annotation.
EDIT - I did find in the ClinVar Data Dictionary that it might be possible for a submitter to submit a Molecular Consequence, because there is the following statement:
Molecular consequence is reported fromSequence Ontology terms, and, when possible, are computed per transcript by NCBI. These terms are in this group because they can be calculated explicitly from the type and location of the variation, unlike the functional consequence which must be established experimentally (or predicted).
So perhaps if ClinVar can't compute it, they can take it from a submitter. Looks like it is not included in the spreadsheet, but the field is included in the XML.
Kevin Power (Jun 08 2020 at 18:48):
Liz Amos said:
... Do we have to be exhaustive in the list of answers? Why not have examples with the option to use any SO code?
If everyone else is OK with that approach, that is fine with me as well. Since we would likely want to turn these into official LOINC codes, is there enough info in ClinVar/GTR for LOINC to create codes for Molecular Consequence and Functional Consequence @Liz Amos ? I don't think we have completely decided we want that yet, but would be good to know if there is enough.
Rachel Kutner (Jun 10 2020 at 21:37):
I've contacted several labs we work with and asked if they would like to join our meeting on the 22nd to discuss Molecular Consequence vs Variant Consequence and how to represent these in the spec. So far LabCorp has expressed interest, and I'm reaching out to several other labs we've worked with in the past to ask if they'll join.
Can we add this to the agenda for the 22nd and I can forward the meeting invite to those labs who are able to join?
Jamie Jones (Jun 10 2020 at 22:16):
@Rachel Kutner absolutely forward the invite, I'll get it on the agenda and we can prepare our ask in the meantime
Arthur Hermann (Jun 10 2020 at 23:39):
@Rachel Kutner did you mean June 22 - the Genomics FHIR call? I am wondering why this conversation moved from the main IG meeting to the FHIR meeting if so..... @Kevin Power @Jamie Jones - perhaps I am misunderstanding, if not, I would like to understand how this moved to a different meeting since the original question and the follow-up work were done during the IG meeting.... thanks!
Kevin Power (Jun 11 2020 at 01:41):
I would be ok doing this on either call.
Rachel Kutner (Jun 11 2020 at 12:39):
I thought I remembered going in depth on this most recently during the FHIR WG call while we were reviewing Jira trackers, and since that's where (unless I'm confused) I was asked about bringing some clinical representatives in for clarification, that's the meeting I invited them to.
To be honest, I've been a little confused about the distinction between the two meetings, as it seems one can act as a bit of a follow-up to the other. Is there a distinction about what should be discussed in the meetings?
Arthur Hermann (Jun 11 2020 at 15:11):
@Rachel @Jamie Jones @*Kevin Power> @Patrick Werner @Bret H - I have to say that i too am confused by what is handled in each of this meetings... but most importantly we need to confirm the date. -I think I can get Invitae to attend. . I sent Rachel's statement from above about Molecular Consequence vs Variant Consequence and how to represent these in the spec and the response I got was: "I already received a reply from my go-to lab resource."I'm not quite sure what that means. Could you provide example values for each? I'm wondering if 'molecular consequence' is something like 'missense, nonsense, silent' and 'variant consequence' is more like 'pathogenic, VUS.'" - what we need is a very clear statement that will make sense to laboratories.... is someone able to provide that?
Kevin Power (Jun 11 2020 at 16:28):
@Rachel Kutner / @Arthur Hermann - Regarding the meetings (FHIR subgroup on Monday, full workgroup call on Tuesday). Monday will always be FHIR specific topics. Tuesday can be any topic of interest to the workgroup that relates to any of the products we own or contribute too (V2, DAM, IM, etc ...). For the last 2 years or so, our group has had a focus on our FHIR implementation guide, so the large majority of our Tuesday calls have also had a focus on FHIR related topics. Hence my comment above, this topic is fine for either of the calls.
Kevin Power (Jun 11 2020 at 16:41):
Arthur Hermann said:
I sent Rachel's statement from above about Molecular Consequence vs Variant Consequence and how to represent these in the spec and the response I got was: "I already received a reply from my go-to lab resource."I'm not quite sure what that means. Could you provide example values for each? I'm wondering if 'molecular consequence' is something like 'missense, nonsense, silent' and 'variant consequence' is more like 'pathogenic, VUS.'" - what we need is a very clear statement that will make sense to laboratories.... is someone able to provide that?
I think our goal was to build out this document with a combination of what we have now for various components, and how they align or don't align with.
https://docs.google.com/document/d/16pECfXFwsqrQDTcMqSCSQPax08v1pPDF8Pec1UWgoS0/edit
However, it is lacking detail. I am afraid I am completely swamped right now, and won't have time the rest of this week to put into this. I will let @Rachel Kutner , @Bret H , or @Jamie Jones correct me if I am wrong, but the idea was to show how other entities (like ClinVar and Ensemble) represent these concepts, come to an agreement as to how we want to represent them.
Jamie Jones (Jun 11 2020 at 16:45):
that's exactly right, Kevin I'll have a go at it again this weekend, hoping folks can contribute some more definitions or links to lists before then
Kevin Power (Jun 11 2020 at 17:03):
We should probably try to include the related fields we have in the IG today as well for comparison purposes? Or would that just confuse the issue even more?
Kevin Power (Jun 11 2020 at 17:30):
I might add for @Rachel Kutner / @Arthur Hermann -- The quote from Arthur's contact above isn't quite right, but it is close. Maybe rather than taking what would likely turn into the entire meeting of us us trying to explain the full history of how we got to where we are, would it be easier to just have them take a few minutes and tell us how they do it? They may not want to do this, but given the confusion of what we have now inside our group, I am concerned that us presenting to them won't make for a productive conversation?
To be clear, I am not saying we should do it this way, but wanted to ask.
Rachel Kutner (Jun 11 2020 at 17:34):
I agree - My initial thinking we could simply show the lists of the terms contained in the two SO I initially suggest in the Jira tracker, and ask how (if at all) these kind of data are resulted by their lab, what term(s) they use to refer to them, and how the values they result for them are determined (where they exist). And ask for definitions of the fields they DO use to result this kind of information.
Then raise the question of whether there's a singular source of truth we could use to base these being represented in the FHIR spec.
Arthur Hermann (Jun 11 2020 at 17:38):
@Rachel Kutner @Kevin Power @Jamie Jones I think I can get invitae to attend and explain what they do and why.... just want to confirm this would be the Monday meeting....
Kevin Power (Jun 11 2020 at 17:51):
Rachel Kutner said:
I agree - My initial thinking we could simply show the lists of the terms contained in the two SO I initially suggest in the Jira tracker, and ask how (if at all) these kind of data are resulted by their lab, what term(s) they use to refer to them, and how the values they result for them are determined (where they exist). And ask for definitions of the fields they DO use to result this kind of information.
Then raise the question of whether there's a singular source of truth we could use to base these being represented in the FHIR spec.
That sounds good to me. Do you mind updating the google doc to include the SO terms you are referring too?
Rachel Kutner (Jun 11 2020 at 18:49):
Did a pretty basic copy/paste for Structural Variant since it has so many values. I might go back and just include some of the high level sections with links to each so we can show what's contained within. This way we can:
- Get an idea how important these concepts are to lab's resulting
- Understand how extensively the terms in the SO are applied (they're very specific and likely more granular than results require) and whether they fully overlap with what the lab uses to result (if applicable)
- Ask for definitions from each lab - what these concepts mean in context of genomic resulting (if applicable)
- Ask if the ClinVar definitions/concepts sufficiently represent these fields/terms as the labs result them. If not, discuss why.
@Arthur Hermann @Kevin Power @Jamie Jones @Liz Amos Thoughts? (Warning: It's A LOT of terms in the document right now... Unfortunately, I don't have time to edit them yet to make it more readable.)
Liz Amos (Jun 11 2020 at 19:33):
Rachel Kutner said:
- Get an idea how important these concepts are to lab's resulting
- Understand how extensively the terms in the SO are applied (they're very specific and likely more granular than results require) and whether they fully overlap with what the lab uses to result (if applicable)
- Ask for definitions from each lab - what these concepts mean in context of genomic resulting (if applicable)
- Ask if the ClinVar definitions/concepts sufficiently represent these fields/terms as the labs result them. If not, discuss why.
This looks like a great outline! @Rachel Kutner , do you think it would be more efficient to get this information to the labs before the meeting so they have time to digest? If so, I could help clean up the document with some direction.
Rachel Kutner (Jun 11 2020 at 19:54):
@Liz Amos I think that would be great!
Arthur Hermann (Jun 11 2020 at 20:31):
@Rachel Kutner @Liz Amos @Kevin Power I have been emailing Invitae to see if they can have a lab person attend a meeting. But I am going to tell them thanks but let's hold off for now. I want to make the best use of their time as I am sure Rachel does of the labs she is working with. Let's decide the plan.. get the information in order, then I will reach out to Invitae again and we can get their input at as well.. This is a great discussion and I really like where you are going with this. I think it would be wonderful to take what Rachel put together - add appropropriate links and then get this info into the hands of some labs, then have them discuss with us during a meeting. Rachel I have time tomorrow to work on some of this if you would like to work together on it.
Kevin Power (Jun 11 2020 at 21:24):
@Arthur Hermann -- why not have them join the discussion?
Jamie Jones (Jun 12 2020 at 20:18):
@Arthur Hermann @Rachel Kutner I left some comments on the other thread (looks like topic diverged from title a bit). I suggest we get our ask on call agendas Monday & Tuesday and then get whoever we can in for the 22nd.
Arthur Hermann (Jun 15 2020 at 17:31):
@Kevin Power @Bret H @Jamie Jones @Rachel Kutner @Liz Amos - I just did a random search through about 10 or so of the lab reports - searching for the term Molecular and Functional - I found this term in almost no reports... www.mayomedicallaboratories.com_example-pos-report-for-cystic-fibrosis..pdf and Lung-adeno_TMB-high3_Sample_Report.pdf are two of them. Not sure how helpful this is - but to me, it seems to indicate that Molecular Consequence and Functional Consequence are rarely, if ever being sent from labs in the forms we have in our data store
Jamie Jones (Jun 15 2020 at 18:23):
These data are usually consumed later in the workflow and not explicitly represented in reports. In your second link:
"Based on extensive clinical evidence, ERBB2 amplification or activating mutation may predict sensitivity to therapies targeting HER2, including antibodies such as trastuzumab,"
the "amplification" and "activating mutation" concepts are precisely what we are trying to encode.
In particular, they aren't often of clinical use by themselves, but can be integral in identifying therapies and prognoses
Kevin Power (Jun 15 2020 at 18:41):
It might be more interesting to see if any of the reports contain terms like we ClinVar has for those two attributes? I would say it is unlikely they would label the terms that way. However, I think if we were to send the labs the ClinVar definitions (such as they are) and some of the example answers, they should be able to tell us if they deliver that sort of information.
Kevin Power (Jun 15 2020 at 18:42):
And just realized @Jamie Jones found some of the terms, so yea, what he said :slight_smile:
Arthur Hermann (Jun 15 2020 at 19:19):
@Kevin Power @Jamie Jones @Rachel Kutner @Bret H @Liz Amos here is the response from one of our clinical geneticists:
Molecular and functional consequence is critical in the result interpretation. Standardization is another matter that is nice to have from the clinician standpoint, but I would think whatever labs use can be slotted in to that field so not mission critical for it to be the same nomenclature / source / formatting.
Leslie Manace MD, MPhil, FACMG
TPMG Regional Director | Precision Tracking
Genetics | Screening & Tracking
Permanente Medicine
The Permanente Medical Group
Arthur Hermann (Jun 15 2020 at 19:21):
@Jamie Jones Thanks for clarifying for me Jamie.... I think your point about these items being critical in identifying progrnosis, etc.. is what Leslie is saying in her response which I just added.... is that true?
Jamie Jones (Jun 15 2020 at 19:25):
Yep, it looks like they are referring to the clinvar concepts more or less directly, and are also restating what @Bret H was saying on call today, that the standardization of those fields likely impacts clinicians/consumers of the reports more so than the labs themselves.
Arthur Hermann (Jun 15 2020 at 21:39):
@Jamie Jones @Kevin Power @Rachel Kutner @Bret H @Liz Amos - here is another reply from a different clinical geneticist at Kaiser:
I agree with Dr. Manace that information on both molecular and functional consequence is paramount in interpretation of the results. As a matter of fact they are intertwined .
Information on molecular change ( location , missense , splice site , frame shift etc ) reflects on the possible functional effect on the protein and hence the clinical consequence .
For example from a report from a recent pt from a lab that we use . You can see how they list information on the variation:
“CASQ2, Intron 5, c.606+1G>C (Splice donor), homozygous, Likely Pathogenic
This sequence change affects a donor splice site in intron 5 of the CASQ2 gene. It is expected to disrupt RNA splicing and likely results in an absent or disrupted protein product. (MOLECULAR)
This variant is not present in population databases (ExAC no frequency). (MOLECULAR)
This variant has been observed in individual(s) with clinical features of catecholaminergic polymorphic ventricular tachycardia (Invitae). ClinVar contains an entry for this variant (Variation ID: 190743). (MOLECULAR)
Algorithms developed to predict the effect of sequence changes on RNA splicing suggest that this variant may disrupt the consensus splice site, but this prediction has not been confirmed by published transcriptional studies. (FUNCTIONAL)
Donor and acceptor splice site variants typically lead to a loss of protein function (PMID: 16199547), and loss-of-function variants in CASQ2 are known to be pathogenic (PMID: 12386154). (FUNCTIONAL)
In summary, the currently available evidence indicates that the variant is pathogenic, but additional data are needed to prove that conclusively. Therefore, this variant has been classified as Likely Pathogenic.”
Arthur Hermann (Jun 16 2020 at 23:12):
Arthur Hermann said:
Jamie Jones Kevin Power Rachel Kutner Bret H Liz Amos - here is another reply from a different clinical geneticist at Kaiser:
I agree with Dr. Manace that information on both molecular and functional consequence is paramount in interpretation of the results. As a matter of fact they are intertwined .
Information on molecular change ( location , missense , splice site , frame shift etc ) reflects on the possible functional effect on the protein and hence the clinical consequence .For example from a report from a recent pt from a lab that we use . You can see how they list information on the variation:
“CASQ2, Intron 5, c.606+1G>C (Splice donor), homozygous, Likely Pathogenic
This sequence change affects a donor splice site in intron 5 of the CASQ2 gene. It is expected to disrupt RNA splicing and likely results in an absent or disrupted protein product. (MOLECULAR)
This variant is not present in population databases (ExAC no frequency). (MOLECULAR)
This variant has been observed in individual(s) with clinical features of catecholaminergic polymorphic ventricular tachycardia (Invitae). ClinVar contains an entry for this variant (Variation ID: 190743). (MOLECULAR)
Algorithms developed to predict the effect of sequence changes on RNA splicing suggest that this variant may disrupt the consensus splice site, but this prediction has not been confirmed by published transcriptional studies. (FUNCTIONAL)
Donor and acceptor splice site variants typically lead to a loss of protein function (PMID: 16199547), and loss-of-function variants in CASQ2 are known to be pathogenic (PMID: 12386154). (FUNCTIONAL)In summary, the currently available evidence indicates that the variant is pathogenic, but additional data are needed to prove that conclusively. Therefore, this variant has been classified as Likely Pathogenic.”
I just wanted to be clear that the geneticist at KP added the MOLECULAR or FUNCTIONAL words to the report section he quoted... again this is in the background work done by the lab to determine the information they provide. It is not part of the report itself.
I also got a reply from my contact at Invitae - this is what he said:
"Please see below - it looks we use these, but they won't appear in our report data.
(How) Do you use ontological terms in your workflow for variant annotation (Such as Molecular Consequence and Functional Consequence per below)?
Do you use Clinvar and/or Sequence Ontology? Do you need to use Clinvar (or other) labels in your reporting?
We use Molecular Consequence Sequence Ontology (SO) terms to calculate our variant names and to aid in variant interpretation, but they are not directly used or displayed in the report data. When experimental data is available in the literature then we will also take the Functional Consequence of the variant into consideration for variant interpretation. This is captured through Sherloc evidence codes (PMID: 28492532). We do not use ClinVar data in our reporting, but do reference ClinVar Variation IDs where available for the general information of the clinician."
Bret H (Jul 27 2020 at 13:23):
arghh, functional implication...
Last updated: Apr 12 2022 at 19:14 UTC