FHIR Chat · MolecularSequence · genomics

Stream: genomics

Topic: MolecularSequence


view this post on Zulip Patrick Werner (Apr 13 2020 at 16:51):

I thought a lot about MolSeq. To me MolSeq was the FHIR version of VCF and/or BAM. Now i think it is more like a/some BAM file(s) (or other sequence formats). MolecularSequence could be similar to DocumentReference. Being either a reference to an existing Sequence(file) while adding some meta information to it or providing the actual content (sequenceStrings).
MolSeq.variant and MolSeq.structureVariant are duplicating our IG Obs. MolSeq quality is too bio-informatics for me to understand fully, is this reported and tested for every sequencing? Or is this some quality information which has to be tested i certain time patterns?

view this post on Zulip Patrick Werner (Apr 13 2020 at 16:57):

Questions/potential Todos:

  • String is limited to 1MByte in FHIR, is this enough for sequence data? We have to warn the implementer about the limitation or choose another data type
  • Observations and DiagReport from our IG have to be extended/checked for a slice: Obs.derivedFrom pointing to MolSeq
  • finding a solution to the duplicated variant issue in MolSeq.

view this post on Zulip Jamie Jones (Apr 13 2020 at 16:59):

String shouldn't be a limitation if one uses
pointer Σ 0..* Reference(MolecularSequence) Pointer to next atomic sequence

view this post on Zulip Jamie Jones (Apr 13 2020 at 16:59):

Our guidance on how to break long sequences up is very thin but the functionality is there

view this post on Zulip Patrick Werner (Apr 13 2020 at 17:07):

"Complete genetic sequence information, of which specific genetic variations are a part, is reported by reference to the GA4GH repository." This shouldn't be only tied to GA4GH, right?

view this post on Zulip Jamie Jones (Apr 13 2020 at 17:08):

used to be there as an example, any reference is fine

view this post on Zulip Bret H (Apr 28 2020 at 15:28):

" MolSeq quality is too bio-informatics for me to understand fully, is this reported and tested for every sequencing? " MolSeq is only used when useful (much like the other functions we have). You're thinking too deeply, I think : ^ ) and we've had lots of discussions on the purpose of MolSeq without standing on a final definition. But, in my opinion its a place to send contiguous chunks of sequence data, can be used as a reference sequence itself, useful for things like entire viral genomes. @Bob Freimuth @Bob Dolin @Larry Babb might consider MolSeq as the Allele representation in our FHIR IG of the allele in the Information Model Bob is guiding. Where we're reporting the sequence that is in a specimen/sample/patient - not so much the variant. That MolSeq can also hold variant information is indeed a point of confusion potentially. But our group needs to provide a final determination. Something to consider.

view this post on Zulip Bob Dolin (Apr 28 2020 at 15:40):

For MolecularSequence, I think I have a kinda basic question - is a MolecularSequence a sequence from a single molecule?

view this post on Zulip Bret H (Apr 28 2020 at 15:41):

I would say yes. that is the intent

view this post on Zulip Bob Dolin (Apr 28 2020 at 15:41):

Reason I ask is that I think that will clarify whether we use it for, say, VCF, BAM, PrecisionFDA quality metrics, etc.

view this post on Zulip Bret H (Apr 28 2020 at 15:42):

in the past, Bob M did some good work showing how MolSeq hierarchically referred to profiles could be used to represent HLA alleles

view this post on Zulip Jamie Jones (Apr 28 2020 at 15:48):

We may need to profile seq or extend it to make this more clear. I think initially it was hoped it be about to cover all of those cases though I haven't seen meaningful examples for each

view this post on Zulip Jamie Jones (Jun 10 2020 at 19:32):

I've reviewed the IM model report and see opportunities to support in MolecularSequence the concepts of "simple sequence", "resolvable sequence", and "relative sequence." "Formatted sequence" may need some help. I struggle to see how any of the current quality fields are useful for these sequence representations, as they are aimed at comparing VCF results. Is there meaningful quality information we can capture at the single molecule level? The structure of MolecularSequence could allow us much more freedom to convey poor quality/no-call regions than what we could do in Observation.

Separately, we need to take a hard look at support for VCF (or gVCF/clinical+VCF and possibly other NGS data) and determine if it should be delegated to another resource.

view this post on Zulip Arthur Hermann (Jun 26 2020 at 00:58):

Jamie Jones said:

I've reviewed the IM model report and see opportunities to support in MolecularSequence the concepts of "simple sequence", "resolvable sequence", and "relative sequence." "Formatted sequence" may need some help. I struggle to see how any of the current quality fields are useful for these sequence representations, as they are aimed at comparing VCF results. Is there meaningful quality information we can capture at the single molecule level? The structure of MolecularSequence could allow us much more freedom to convey poor quality/no-call regions than what we could do in Observation.

Separately, we need to take a hard look at support for VCF (or gVCF/clinical+VCF and possibly other NGS data) and determine if it should be delegated to another resource.

@Jamie Jones @Bret H @Kevin Power
I just found out that our Oncologists do want the actual VCF file sometimes (not sure who uses/interrogates it but he was quite clear). Therefore I agree that being able to add a VCF to the message should be have a clearly established method in the IG

view this post on Zulip Arthur Hermann (Jun 30 2020 at 16:05):

(deleted)

view this post on Zulip Bret H (Jul 27 2020 at 13:07):

a VCF file could be treated like MEDIA and use the Reference data type. However, if one is sending VCF , one can still use the variant profile for variants too.

view this post on Zulip Bret H (Jul 27 2020 at 13:09):

not sure how the discussion on VCF relates to Molecular Sequence. I always thought of MolecularSequence as particularly useful for LONG -contiguous nucleotide information - like you see in a RefSeek record.


Last updated: Apr 12 2022 at 19:14 UTC