Stream: genomics
Topic: Combining similar types of components
Kevin Power (Jan 29 2020 at 15:55):
One of the discussion points from Clem yesterday was the idea of combining similar components together. This was one of the examples, though there are others:
Reference sequence (Crucial and required) but do we really need two separate fields? They are self-defining – Can we collapse into one field – they are self-defining- perhaps allow 2 if we need that many
51958-7 Transcript reference sequence [ID]
48013-7 Genomic reference sequence [ID]
It does seem we could do it fairly easily, but it would necessitate that we create a new ValueSet containing the LOINC codes mentioned above that we could bind Variant(Observation).component[].code
to for validation purposes. We would need to make the new combined component 0..*, which would also imply we lose the ability to simply say "please send at most 1 transcript reference sequence." Longer term, we would also (I think?) lose the ability to validate different value sets for the different types of reference sequences.
I am still not sure if this really makes the Variant profile "easier" or not, but wanted to ask for other feedback before we officially consider this request.
Jamie Jones (Jan 29 2020 at 16:00):
I think part of his point is that a genomic reference sequence often uniquely determines a transcript reference sequence and vice versa, so we could (potentially) limit it to 1..1 reference
Jamie Jones (Jan 29 2020 at 16:14):
If we do send both currently, how do we interpret the variant's exact-start-end?? Our guidance says "genomic" so does it default to that if both refs are given?
Kevin Power (Jan 29 2020 at 16:14):
Fair point. I am not sure that will always be a 1..1 (especially for non-NCBI references, I REALLY don't know those very well), but perhaps it is - for now, let's assume it is. Similar to our other discussions, I am uncomfortable with our base profile putting assumptions that the receiving system can do a lookup like that.
Kevin Power (Jan 29 2020 at 16:16):
Given our definition of exact-start-end, I would assume it would be based on the genomic coordinates regardless of what reference sequence(s) are provided (which is honestly not something we should probably assume - this could use some clarification/work to be honest).
Kevin Power (Jan 29 2020 at 22:24):
Do others have comments on this? To tag a few people who have done some real work with the IG, curious to hear from @Mullai Murugan or @Larry Babb , @Bob Dolin , maybe @Bret H ?
Bob Dolin (Jan 29 2020 at 22:47):
I think @Jamie Jones raises a good point, that other components potentially become ambiguous if we collapse all refSeq's into a single component. That said, I like the idea of collapsing - maybe it won't work in all of the cases Clem suggested, but maybe it would work for, say, HGVS, and from there, we'd just need to do profiles.
Kevin Power (Jan 29 2020 at 23:12):
Thanks @Bob Dolin -- I think we already have some ambiguity, as the 'genomic ref seq' component is not required, but the 'exact-start-stop' component says it is based on 'genomic coordinates.' If we want to put some invariants/rules in place saying if you send 'exact-start-stop' you must also send 'genomic ref seq,' we can do so regardless of if we combine the ref seqs or leave them seperate as they are today.
Jamie Jones (Jan 29 2020 at 23:17):
Those groups of components meant to be used together feel like unrwritten rules we ought to put in
Bob Dolin (Jan 29 2020 at 23:25):
Thanks @Kevin Power. I do like the idea of reducing the components, but I kinda wonder if it might be good to start with just one, and from there consider each reduction on its own? maybe, for instance, we'd start by seeing whether it's possible to collapse the three HGVS components (c,g,p) into just one?
Jamie Jones (Jan 29 2020 at 23:29):
HGVS seems most convenient for the amino acid change, especially since ref and alt are genomic
Kevin Power (Jan 29 2020 at 23:36):
I would be OK with combining the HGVS components. Anyone opposed to that becoming a tracker in the theme of "simplification" ? :slight_smile:
Larry Babb (Jan 30 2020 at 01:45):
It would be so helpful if we (CGWG) could produce a few low level data types that provide the constructs (or components) that all depend on each other to create the values needed to represent the results or components in the Observations. It is very difficult to have all these rule around which attributes go together and depend on each other, especially when there's need or interest in having multiple complex components.
While I think the Observation.component[] concept is a very useful and helpful mechanism to provide additional values that help clarify or enhance the data that is observed. I think these variant scenarios illustrate that we are exceeding the limits and intention of that design by trying to fit one or more "components" with deeper structures than the Observation.component[] mechanism allows.
If we (CGWG) want to provide all the various ways some one might want to represent a variant, then why not create an Observation Profile for each specific use case. In other words, instead of one generic Variant profile. Have an HGVSVariant profile, a VariantCallFormat Profile, an StarAllele Profile, etc... This would produce upwards of a dozen profiles, but it would make it very clear which constraints were required for the various variant representation types. And, if someone wanted to use a "genomic" HGVSVariant profile they could type it as "genomic" and then have sub-observations for the "transcript" and "protein" types observations that the producer may wish to additionally send with the base observation.
Sorry to diverge from the original ask. But it is difficult to try to untangle the little knots, hoping to make a meaningful improvement when we our knot is part a "nest" of knots. If we don't address the bigger issue of simplifying
Lloyd McKenzie (Jan 30 2020 at 01:50):
That's certainly possible - however most 'custom' data types that have been introduced into the spec so far have been on elements that everyone must support and that aren't 'choice' structures. The risk of introducing custom types for either Observation.component.value or for extension is that they won't be widely supported - which may limit the ability to propagate data through/to systems.
Kevin Power (Jan 30 2020 at 02:08):
We could do lots of profiles and still have a single Observation instance that meets all the ones makes sense. We would have to define our profiles carefully so that they can work together, but it would be possible. I am hesitant to define something that would require many Observations to report a single variant.
Kevin Power (Jan 30 2020 at 03:22):
I do think it begs the question I have been thinking more about recently - does fitting into a less-than-ideal but more widely adopted structure, like Observation and its components, provide enough value over first class structures for genomics that won’t be more generally adoptable without additional work for implementers? I know we started the IG on the premise that the value of widely adopted and understood structures was worth it. Given where we are, do we all still believe that? Are we getting to a point of defining (or needing to define) our profiles such that their complexities will require additional implementation anyway?
Those are probably unanswerable questions right now, but easy questions are no fun :grinning:
Jamie Jones (Jan 30 2020 at 03:29):
Are you suggesting we ought to revisit MolecularSequence???
Kevin Power (Jan 30 2020 at 03:44):
I don’t know what I am suggesting right now. 🤯
Kevin Power (Jan 30 2020 at 03:47):
In all seriousness, I think we continue to stick close to what we have now and making incremental progress. However, once the IM model is reviewed and ready - it might be time to revisit decisions like this
Jamie Jones (Mar 04 2020 at 17:38):
Have to revisit this for hgvs representation on Variant :)
Kevin Power (Mar 04 2020 at 19:48):
I think our group should consider combining all HGVS components into a single component. As we think about what to do for the "Variant Simplification" theme more broadly it might change again, but we should embrace evolution :slight_smile:
There were other things on this thread that are probably a bridge too far for now.
Bret H (Mar 15 2020 at 16:24):
wisepeople keep saying 'you can move complexity around but you can't hide it forever.' If we remove the more pre-coordinated reference fields, as has been discussed, the recipient of the data will need mechanisms to parse the incoming data to know if it is relative to protein, RNA, DNA. So the recipient side would need extra effort. Additionally, querying becomes an issue --- if it were to matter in the query of the genetic data were RNA, AA, or DNA. ..we're a bit stuck. Simplification in one space creates complication and need for control logic in another. Sorry not to be helpful. The only thing I can offer, is we make a decision and stick to it for a declared amount of time. The more time I spend in this group the more I see a very strong need to make stability contracts with the would-be implementer. This means stabilizing our product and adhering to a SPECIFIC release cycle. Our community deserves it, and patients will be better served the sooner genetic data is discreetly available.
Jamie Jones (Jan 12 2021 at 16:21):
I think this thread from last year may be the most aligned spot for continued "IG Lite" discussion. Current google doc where we've been considering updated guidance for http://build.fhir.org/ig/HL7/genomics-reporting/sequencing.html is here: https://docs.google.com/document/d/15g1rngmgFoE8_i7UGpXRpISkOCrd1QCoQBg9REY80EI/edit
Last updated: Apr 12 2022 at 19:14 UTC