Stream: genomics
Topic: MolecularVariant and Observation
Bob Freimuth (Sep 23 2021 at 20:34):
@Alexander Mankovich @Jamie Jones @Patrick Werner
Today during Q3 of our WGM, Alex asked a question about representing the equivalent of "variant x not found". I didn't see the question in the chat until after the meeting so I am replying here.
Creating a toy example for illustrative purposes: Var123 is a T at position 500 on sequence SEQ999 instead of a G
Therefore, I interpret the statement "Var123 not found" to mean "the T allele was not found in this subject's genome at the position homologous to position 500 on sequence SEQ999".
If it is known that the T allele was not found at that position, then something else must have been observed there instead, so let's flip the statement to the positive and assume G/A was found there instead:
"G/A found in this subject's genome at the position homologous to position 500 on sequence SEQ999"
Notice that I chose G/A rather than G/G to make the point that that absence of one allele does not imply the presence of another. :-)
Note that MolecularVariant does not yet formally support genotypes, but we can put that detail in a black box and express the idea in the positive:
Observation for patient A points to an instance of MolVar that defines the G/A genotype at position 500 on sequence SEQ999
I would then create a second Observation derivedFrom the first, which expresses the conclusion that "the T allele was not found":
Observation for patient A derivedFrom <previous Obs>, ref<MolVar/T allele>, value = "absent" or "negative"
Decoupling what was actually observed from the interpretations or conclusions that are based on it (whether stated in the positive or negative) allows for re-interpretation of those conclusions without altering the primary Observation.
Does that help? I'm interested in your thoughts, especially if they differ from the approach I outlined above.
Bob Dolin (Sep 23 2021 at 20:40):
@Bob Freimuth I'm sorry I missed the discussion. Another two options: [1] explicitly marking an observation as 'absent'; [2] use the region-studied profile to infer variants not found.
Jamie Jones (Sep 23 2021 at 20:45):
I agree we should consider the upstream use cases for why we would want to say 'variant X not found'.
Kevin Power (Sep 23 2021 at 21:20):
Even in our IG (Variant profile) today, we do not have guidance on how to consistently indicate 'Var123 was not found" - most likely answer is the .value = absent, component[*] = variant of interest
approach. If/When we introduce MolecularVariant, I would like to see us document our guidance. What Bob F said makes sense to me, though it is a bit indirect. It is possible that those use cases have specific code
s available, and perhaps we should consider how we enable that?
Bob Dolin (Sep 23 2021 at 21:23):
One problem I see with creating explicit observations for variant=absent is that there are potentially a LOT of absent variants in a studied region.
Kevin Power (Sep 23 2021 at 21:25):
Agreed, which is what makes the comment from @Jamie Jones very important - how many use cases do we have that require this, and then we come up with guidance to match it.
Alexander Mankovich (Sep 23 2021 at 21:49):
Thanks for clarifying Bob. I think this approach makes sense, and like Bob D.'s addition with the region-studied profile. What I'm trying to satisfy is the ability to express that, e.g. there was no variant found in the KRAS gene, or we found a variant encoding a BRAF V600E but not elsewhere on the BRAF gene.
I understand there are too many instances to report all the not-found variants in a gene, and my thinking was primarily surrounding use cases driven by clinical trial matching and therapies where resistance to targeted therapies can be inferred from specific mutations like MEK1 P124Q/S and BRAF inhibitors. This may also just be something that sits on the interpretation side of things and may even be outside the scope of molecularvariant - I'll do a bit more digging and see if I can flesh out the more routine use cases and go from there.
Kevin Power (Sep 23 2021 at 21:53):
Alexander Mankovich said:
This may also just be something that sits on the interpretation side of things and may even be outside the scope of molecularvariant - I'll do a bit more digging and see if I can flesh out the more routine use cases and go from there.
That feels right to me.
Patrick Werner (Sep 23 2021 at 21:56):
Jamie Jones said:
I agree we should consider the upstream use cases for why we would want to say 'variant X not found'.
Biomarker testing is a use-case. You are testing for presence/absence of defined variants
Patrick Werner (Sep 23 2021 at 21:56):
+1 to @Alexander Mankovich 'spost
Jamie Jones (Sep 23 2021 at 21:59):
I'd like to evaluate approaching genetic biomarker testing with the same pattern we want to propose for karyotype/protein/other biomarker testing.
Jamie Jones (Sep 23 2021 at 22:02):
Is a statement of not having a certain variant independently useful as an observation without knowing that it is being used as a biomarker test?
Kevin Power (Sep 23 2021 at 22:03):
@Jamie Jones - Sorry, I guess I missed that. What pattern is being proposed?
Jamie Jones (Sep 23 2021 at 22:06):
We're still working on a proposal, but a request has been made for our WG to create guidance for other types of biomarkers that may be included in reports
Kevin Power (Sep 23 2021 at 22:07):
Jamie Jones said:
We're still working on a proposal, but a request has been made for our WG to create guidance for other types of biomarkers that may be included in reports
What request is that? (feel like I must have been asleep through 1/2 of the WGM :smile:)
Jamie Jones (Sep 23 2021 at 22:08):
The default approach for these tests is to put most of the complexity of the test in Observation.code, and have a quantifiable value with an interpretation, all as one Observation
Kevin Power (Sep 23 2021 at 22:08):
So, "Biomarker" becomes "Something that has it's own Observation.code
?"
Jamie Jones (Sep 23 2021 at 22:13):
I would say Bob's "second Observation" derivedFrom the first with an interpretation could be called a biomarker
Jamie Jones (Sep 23 2021 at 22:14):
There's an option to push the complexity of the specific test from Observation.code to a value
Jamie Jones (Sep 23 2021 at 22:20):
I could see that becoming a third type of implication, and would want us to review other work in the biomarker/interpretation space.
Bob Freimuth (Sep 23 2021 at 22:20):
There are literally infinite things that are not found at any given location, because there are an infinite number of possible variants that could occur. I think it is much more valuable (from a data re-use perspective), though it requires two separate statements, to first record what was seen and then make a second observation that interprets 1..* primary observations in the context of a specific use case.
Bob Freimuth (Sep 23 2021 at 22:49):
@Alexander Mankovich To your use case re: BRAF V600E
This is an example of how things get a little tricky. Please bear with me as I break this apart to the point where most will stop reading. :wink:
V600E is (obviously) a protein variant, which would not be observed directly by a genomic test. It is a prediction of the protein sequence that would result if the genomic sequence that was observed was transcribed and translated, assuming there isn't any regulatory or processing weirdness due to things we didn't see or don't yet understand. Furthermore, the V to E change is degenerate, since there are 4 possible codons for V (GTn) and 2 for E (GAr). This means that there are several possible changes that could have occurred: GTG => GAG, GTA => GAA, GTC => GAG, etc. Recording only the protein change does not allow anyone to dig deeper to ask questions about whether a specific genomic change was present. Finally, I personally do not think it would be accurate to record "V600E" as a primary observation unless it was detected through protein sequencing (or similar). If it was predicted based on a genomic test, then I'd chain a few things together.
MolVar1: T>A at pos xyz in BRAF gene (corresponding to the GTG=>GAG change in codon 600)
MolVar2: V>E at pos 600 in BRAF protein
MolVar1 has a VariantRelationship of "predicted translation" with MolVar2
Obs1 (primary): MolVar1
Obs2 (derivedFrom Obs1): MolVar2
You are correct that interpretation is outside the scope of MolVar, but you may not be surprised that I have ideas about that as well. :wink:
Bob Freimuth (Sep 23 2021 at 22:54):
@Bob Dolin As you know, I think the concept of "region studied" is very important and we need to support it. Currently, it contains information that I think could belong in different structures. For example, I think test metadata should be in the method rather than in an Observation. I'd like to try to tease those things apart a bit as we get into the modeling of the report and procedure/method.
Bob Dolin (Sep 23 2021 at 22:57):
thanks @Bob Freimuth . I'm all for that - I'm finding more and more that there are key pieces of the test itself that provide valuable context to results interpretation (such as coverage region, types of variants the test can detect, etc).
Bret H (Oct 11 2021 at 14:22):
If it is a well defined variant that is NOT observed (i.e. absent). Seems like one could describe the specific variant with Variant and use the 'Absent' value in the variant value. @Bob Freimuth @Jamie Jones what is missing with this approach? @Bob Dolin obviously not useful to do this for every variant and it does not capture sufficiently the actual sequence of the patient, but I wonder if it would meet Alexander's use case well enough.
Last updated: Apr 12 2022 at 19:14 UTC