Stream: genomics
Topic: Capturing Genomic Panel Definitions
Patrick Werner (Aug 13 2018 at 16:05):
As just discussed, we have the need to save the ordered AND the actual conducted Genomics Panel as the labs we are working with just transmit positive results. One way would to add all genes in ServiceRequest.orderDetail , but that would only represent what was ordered. The actual conducted panel could be a different one.
I can't see a way to do this currently, but maybe i'm missing something.
Data needed here would be: PanelName & List of Genes which were looked at.
Kevin Power (Aug 13 2018 at 17:30):
I think in many scenarios, the ServiceRequest will be defined as a specific panel of genes (one to many) that will be analyzed by the lab.
In cases like Whole Exome or Whole Genome tests, should the ServiceRequest have a way to define the genes, or would it make more sense for something like 'phenotype codes' like in HPO? I think it depends on who we feel is the most equipped to determine the appropriate set of genes based on the phenotype? I don't have the answer, but wanted to ask the question?
Kevin Power (Aug 13 2018 at 17:32):
The answer of who is likely, as nearly always, "both" - so perhaps we need to define patterns to support both?
Michael Osborne (Aug 13 2018 at 20:19):
Have you looked at the V2.5.1 IG to see what patterns were applied in the past? Here is a link to the document...http://www.hl7.org/documentcenter/public/wg/clingenomics/2016-08-04-1800_6PM_Clinical_Genomics_Coded_Reporting_Lab_US_Realm_IG.docx
Patrick Werner (Aug 13 2018 at 20:37):
In cases like Whole Exome or Whole Genome tests, should the ServiceRequest have a way to define the genes, or would it make more sense for something like 'phenotype codes' like in HPO?
A coded form of what was looked at seems reasonable. If we use a CodeableConcept (0..*) the user/implementer can choose for his usecase wether he wants GENEs or phenotype codes.
Patrick Werner (Aug 13 2018 at 20:39):
So we could put "what was ordered" into ServiceRequest.orderDetail. But this doesn't solve the "what was looked at" problem. This can be different than the order. For this we need another List with coded things, referenced from the Diagnostic Report
Patrick Werner (Aug 13 2018 at 20:39):
Could also be an Extension in DiagnosticReport
Kevin Power (Aug 13 2018 at 20:40):
In short, we will look to V2 history to help inform us. In this case, we have updates in the latest LRI:
http://www.hl7.org/documentcenter/public_temp_11CF53C3-1C23-BA17-0C2788E88246FDCC/standards/dstu/V251_IG_LRI_R1_STU3_2018JUN.pdf
See section:
5.7.1 CLINICAL GENOMICS REPORT SECTION 1 – MASTER HL7 REPORTING PANEL
And it includes ways as saying:
Genes Studies [0..*]
Gene Mutations Tested For [0..*]
Ranges of DNA sequence examined [0..*] (numeric ranges)
Description of ranges of DNA sequences examined [0..1] (narrative)
Kevin Power (Aug 13 2018 at 20:42):
I guess I would lean towards modeling the 'what was looked at' as an Observation first, but if that doesn't fit, an extension off of DiagnosticReport seems reasonable.
Kevin Power (Aug 13 2018 at 21:31):
Would anyone disagree that the first version of our "What was tested" could be covered by an Observation profile that is directly referenced from DiagnosticReport? @Lloyd McKenzie @Bob Freimuth @ Bob Milius @Gil Alterovitz @Amnon Shvo @Bob Dolin
Lloyd McKenzie (Aug 13 2018 at 21:43):
"What was tested" would certainly be an Observation.
Bob Dolin (Aug 13 2018 at 21:45):
I'd suggest that we build the Observation profile anticipating the need to annotate each region.
Lloyd McKenzie (Aug 13 2018 at 22:06):
Agree that we'll need a distinct Observation for each region we want to say something unique about.
Kevin Power (Aug 14 2018 at 13:50):
How many different ways are there to define the regions? I mentioned it earlier, but the V2 LRI had the following options:
Genes Studies [0..*]
Gene Mutations Tested For [0..*]
Ranges of DNA sequence examined [0..*] (numeric ranges)
Description of ranges of DNA sequences examined [0..1] (narrative)
The example Bob D shared on the list serv indicated the ranges by chromosome + numeric range.
Lloyd McKenzie (Aug 14 2018 at 13:58):
The other consideration beyond "where did I look for variants?" is "what types of variants was I looking for?" If you were using a chip that only matches certain things, or even if you got good reads on the entire sequence but ignored anything that wasn't listed as a potential pharmacogenomic impact for drug X, that'll influence how we report
Bob Dolin (Aug 14 2018 at 14:32):
While I favor numeric ranges (in part because they can be arbitrarily granular and because from them you can derive genes, exons,etc), they also require consideration of build and coordinate system used. If we were to, say, just list genes studied, I don't know how we'd represent something like 'Gene X was studied, but exon 5 is uncallable'. I would support enabling multiple methods - range, genes, narrative description, etc.
Bret H (Aug 14 2018 at 16:03):
Tough. You all have noted methodology dictates the granularity possible. However, If we provide a means for all the various types (e.g. numeric, list of genes, etc....) then a Clinical system can tell a vendor that they want a particular format and how to send it.
The conversation between vendor and EMR, or EMR to EMR is one that is negotiated in practice. A lab certainly has the capability to populate a well described region studied from a NGS test, or even a microarray probe set, however they are unlikely to expose the information unless a consumer requests it. The other mechanism which would dictate exposure is governmental regulation. Because of the state of genomic reporting, we live in a world of desperate ambiguity. One must assume nothing and order tests that are duplicate studies (applies to patient genome, tumors are a different story). It becomes a matter of informing clinical practice and nudging clinical practice to a new state. I think the best we can do to is provide a means for the more granular style of reporting to be possible.
related aside - a ClinVar ID is based on a machine readable numeric location (available through and API call). A VCF file has a machine readable numeric location. A Bed file has a machine readable numeric location. Microarrays have machine readable numeric locations associated with their probe set (the array is manufactured with specific genomic locations in mind). FISH uses probes similarly to Microarrays. Sanger sequencing heavily relies upon known genomic location (primers). MassArrays and PCR are similar to Sanger sequencing. Thus, I am left with the statement:
Machine readable numeric locations are consistently involved in genetic testing except within a lab's report on findings (Machine readable numeric: reference, position in reference).
So, my vote is unfortunately for all variations of 'location interrogated' to be available.
Ammar Husami (Aug 15 2018 at 14:56):
I would suggest to define the lab panel by the manufacturers kit ID number, most labs use commercially available panels or customize their panels. This ID would provide all the probes and genomic targets of this specific panel. It would be defined by genomic coordinates i.e. in bed file format. The genomic coordinates can be queried to investigate what genes, transcripts, regions are assayed.
Kevin Power (Aug 15 2018 at 15:14):
Interesting idea. We could look to model that as a profile of DeviceDefinition (or maybe Device)?
http://build.fhir.org/devicedefinition.html
http://build.fhir.org/device.html
Ammar Husami (Aug 15 2018 at 16:28):
I don't think device is the right category. in my mind it's an assay type category.
An instrument (device) is capable of producing multiple assay types i.e. the illumina sequencers depending on the assay design can produce data type that would be for NGS panel, whole exome, or whole genome. The manufacturer usually provides a target design file for commercially available assays.
the library designs change so labs mask the gene regions for purposes of different target panels. i.e. neurodevelopmental panel and metabolism panels would be on the same assay design. There will be a second type that defiened which panel was ordered or analyzed
fun fact: did you know that the FASTQ data from Illumina contains an illumina sequence identifier i.e. a unique instrument name https://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers
perhaps there is a list of all the sequencers from (e.g. Illumina) also there is information about which run and which flow cell was used. I thought that might be useful information to have.
Kevin Power (Aug 15 2018 at 17:22):
I don't think device is the right category. in my mind it's an assay type category.
I am not sure there is a better FHIR resource to match this today:
http://build.fhir.org/resourcelist.html
That is not to say we couldn't propose a new "Assay" resource, but we should always compare our requirements to existing resources first. I think the intention of Device is to be more broadly interpreted than you might imagine. Certainly the instrument itself would be a device. It could be an Assay becomes a profile of Device?
RE: your fun fact - We try to remain as platform independent as possible when it comes defining our model. However, the Device resource contains UDI attributes if that applies.
Lloyd McKenzie (Aug 15 2018 at 18:44):
Device is used for software, thermometers, wheelchairs, MRIs, chips, etc. If it's manufactured and can "do" something or has configurable parameters, it's going to be a device in FHIR terms.
Ammar Husami (Aug 15 2018 at 19:05):
@Kevin Power "It could be an Assay becomes a profile of Device?" I am thinking the configuration of the assay (test) would include genomic targets and the query would evaluate if there is intersection of the question with the target intervals assuming the target regions have acceptable range coverage by the labs standards. There would be regions that labs would deem "validated" that the query would check as well. There are always areas that can not be covered with one assay in the range 1-5%
Kevin Power (Aug 15 2018 at 19:38):
To support the type of use case you describe, we have a way to represent a variant, and it can include the reference+start+end position details. So today, you could query with a range, and get back variants that were found in that range. However, we don't yet support telling someone that they might have asked for a region we haven't tested, or perhaps only a portion of the requested range was tested. So ultimately that is what we are trying to work into the model. As we define a standard way to represent the regions, we can then layer on things like quality.
(Note - FHIR has a way to specify search parameters that implementers should handle, so we would want to make sure we cover the "search by range" parameters)
Kevin Power (Aug 15 2018 at 20:14):
I think the most important thing at this point is to define how the genomic targets can and should be defined. In the V2 world, we defined the following:
Genes Studies [0..*]
Gene Mutations Tested For [0..*]
Ranges of DNA sequence examined [0..*] (numeric ranges)
Description of ranges of DNA sequences examined [0..1] (narrative)
In earlier discussions on the topic, it seems there is an unfortunate need to support all of those as options, due to the variety of ways labs choose to communicate the data, and how receiving systems will be able to process the data. @Ammar Husami - Since you joined later, do you see any other ways that we should consider?
Kevin Power (Aug 16 2018 at 22:02):
So, to keep this conversation going - We have basically two proposals on the table:
- Profile Observation (Regions-Studied) with components[] to define what was tested
- Profile Device (Genetics-Assay) with extensions to define what was tested
Reactions? Preferences? I lean towards profiling Observation.
Lloyd McKenzie (Aug 16 2018 at 22:15):
I think assertions of what was tested need to be made using Observations. There will be a wide variety of devices that can test things in different ways. The record of "what was found" should be captured in a way that's consistent and independent of how the testing was done.
Bob Dolin (Aug 23 2018 at 15:44):
I've tried to synthesize the discussion into a concrete proposal:
1. Create a new Observation profile, 'Region-Studied', hanging off GeneticsDiagnosticReport. [0..*]
2. Definition for new profile: "The Region-Studied profile is used to assert intended coverage areas for the performed test(s). Intended coverage areas may differ from actual studied areas (e.g. due to technical limitations during test performance). Refer to test results for actual studied areas.".
3. Components
component (gene studied) 48018-6 [0..*] (HGNC symbol or NCBI gene code)
component (gene mutations tested for) 36908-2 [0..*] (coded, preferable HGVS)
component (description of ranges of DNA sequences examined) 81293-3 [0..1] (narrative)
component (region scope) TBD [0..1] (Whole Genome | Whole Exome | Gene Panel | Specific Variants)
Lloyd McKenzie (Aug 23 2018 at 15:50):
That isn't going to tell you what's needed. When you're querying the list of Observations for a patient, you want to know "is there something in this space?" and to get back an answer of "yes, we looked in this space and we found X and Y" or "we looked in this space and found nothing" or "there are no observations for that region". Saying "we looked in this space" isn't useful without the "we found nothing" or "we found X and Y".
Kevin Power (Aug 23 2018 at 16:01):
If we take simple examples to start, you could do what you are suggesting with two queries?
1) Find regions-studied with a gene-studied = TPMT (if nothing is returned, you know that gene was not tested)
2) If something was returned, now find "genetic findings" with a gene of TPMT
Kevin Power (Aug 23 2018 at 16:10):
If query #1 found the region, and query #2 doesn't return findings, no variants were found
Bob Dolin (Aug 23 2018 at 16:18):
I was looking at this similar to what @Kevin Power is saying. Plus, I was also assuming that we need to differentiate the intended coverage from the actual coverage. It's the actual coverage where I think @Lloyd McKenzie your comments may be more applicable, and where "what was found" and "what was the quality in those regions" becomes relevant. This is addressed to some extent in the Sequence resource, and it'll be useful I think to delve into Lloyd's comments more as we do the Sequence vs. Observation comparison.
Lloyd McKenzie (Aug 23 2018 at 16:31):
Essentially, a stand-alone Observation that says "I looked at region X" is as useful as a statement of "I made a diagnosis" without capturing what the diagnosis was.
Kevin Power (Aug 23 2018 at 16:37):
I guess I disagree. I would say the fact that we are delcaring what region we looked at makes it more useful that just "I made a diagnosis"?
Lloyd McKenzie (Aug 23 2018 at 16:43):
I looked at something - and I'm not telling you whether anything was found or not - doesn't really help.
Bob Dolin (Aug 23 2018 at 16:51):
@Lloyd McKenzie I'm not sure I'm following what you're suggesting. Where I'd really like to go is to have annotated regions (I'm talking about the actual coverage, as opposed to the intended coverage now). Annotations can be down to base-pair resolution, and primarily need to indicate if a region was covered or not, and whether or not it had sufficient quality. Like @Kevin Power says, I see a two step process - I want to know about, say, TPMT variants. First, I need to know what regions of the gene were (sufficiently) covered. From there, I can query for computable genetic findings that are within my region of interest.
Lloyd McKenzie (Aug 23 2018 at 17:40):
Every Observation stands alone. The Observation that describes the region must also describe what was found there.
Kevin Power (Aug 23 2018 at 18:07):
I guess I see the Observation of "what did I look at" being valueable as a standalone? I am really not sure how to do what you are suggesting, unless the region-studied observation somehow points at the findings?
You might have found many variants in a gene - are you saying that a single observation must contain everything you found in the defined region?
Bob Dolin (Aug 23 2018 at 18:14):
I think we are conflating quality with variants. The region-studied (for actual coverage, as opposed to intended coverage) is a first order observation - it can be used to tell us about what was read and the quality / analytic accuracy of reads in a region. There may or may not be variants in readable regions, but that's not the point. The observation is primarily telling us which regions were adequately read. To Lloyd's point, it's not so much an observation as to what was found there, but rather, an observation of whether we think things were findable there.
Jamie Jones (Aug 23 2018 at 18:36):
If Lloyd is convinced this data shouldn't be its own Observation, could we instead add it directly to DiagnosticReport, as 0..* components? Or perhaps this could be absorbed into the Genomics-Panel observation, which seems orphaned currently.
Kevin Power (Aug 23 2018 at 18:46):
We certainly could look at adding it to DR, but it doesn't have "components" list that is found on Observation that allows for a built-in mechanism for delivering additional data. We could consider it as new extensions to DR I suppose?
Interesting question about genomics-panel. Perhaps these "region defining" components do make sense as part of that profile? I am a little curious though after this discussion - for my own learning, how does the current defintion of Genomic Panel provide value useful information in and of itself?
Lloyd McKenzie (Aug 23 2018 at 19:17):
Having an Observation that says "I was able to look at this region with sufficient robustness that results could be determined" is sort of the same as "I was able to get the child to cooperate such that I was able to get the cuff on and take a good-quality blood pressure reading". I mean, yes that's important, but that alone isn't very useful. It's really a qualifier on the value read. "I found this, but the result has uncertainty because the quality of the reading wasn't great" or "I didn't find anything and the quality of the reading was excellent, so there's a high certainty nothing is there".
Jamie Jones (Aug 23 2018 at 19:21):
I agree that these arguments applied to Genomics Panel as a standalone Observation show the weakness of the profile. It seems to say, "we made some observations, here is a list of them"
Lloyd McKenzie (Aug 23 2018 at 19:26):
Having panels is fine - they're ways of organizing related information. But none of the descriptive information about the panel conducts into each of the leaf level observations - the patient, the specimen, the gene, the device, the quality, etc. all needs to be declared explicitly on every single observation.
Bob Dolin (Aug 23 2018 at 19:27):
Sorry Lloyd, but still not sure I'm following you. How would you represent "I didn't find anything and the quality of the reading was excellent, so there's a high certainty nothing is there"? Also, might be worth keeping in mind that the readability of a region is determined upstream of variant calling. In some cases there may not be any variant calling because you just want the sequence.
Kevin Power (Aug 23 2018 at 19:30):
I don't think I am following either. I don't see how the Panel is OK by itself (when it only relates a bunch of other observations), but the Region Studied is not, when it is used to describe what was tested? To know anything about the Panel, you have to read other Observations? To know if something was detected in a region-studied, you have to read other Observations?
Kevin Power (Aug 23 2018 at 19:32):
If we proposed adding the same components to the Genomics Panel profile, is that OK?
Lloyd McKenzie (Aug 23 2018 at 19:40):
A panel is just a grouper. It organizes information but doesn't actually convey any of that information. So all you can ever get out of such an observation is "A complete blood count was done on Patient X on date Y". The full details are in the observations that are referenced by the panel. If you deleted the panel Observation, you wouldn't have lost any detail about the individual leaf-level observations because they each contain the full context of who did them, when, etc. - including things like whether there was fasting before hand or anything like that.
Lloyd McKenzie (Aug 23 2018 at 19:41):
When you make an Observation, the value of the Observation can indicate whether anything was found. So you can say "I looked for this type of thing in this region and didn't find anything. The region, the quality of your reads, etc. would all be conveyed as components of that Observation.
Bob Dolin (Aug 23 2018 at 19:44):
Lloyd, I don't think that works here because you'll have to create a ton of observations only to say that nothing was found. Also, I don't think that addresses the scenario where you just care about the readability of a region, and didn't actually look for variants.
If anything, I would see a relationship between region-studied and computable genetic findings.
Kevin Power (Aug 23 2018 at 19:45):
That helps me some - but still not clear on the implications. Again, simple example - my test sequenced all of CYP2C9 and I found two variants - how many observations is that? How does that align with DescribedVariant profile, where we currently would define the two variants found?
Jamie Jones (Aug 23 2018 at 19:45):
While investigating this, I'm noticing a very misleading definition of "performer" on the Genetics Observation Common Properties:
"Summary of all genes, drugs and/or conditions tested for for which there were no significant/reported findings. Allows indication of what was tested for in a relatively efficient manner."
Does anyone know of previous intent to add this functionality here, or is this just a typo?
Kevin Power (Aug 23 2018 at 19:51):
@James Jones - Are you looking at the current build? I don't see that statement on performer? We did have some corrections in this area before:
http://build.fhir.org/ig/HL7/genomics-reporting/obs-base.html
Jamie Jones (Aug 23 2018 at 20:00):
@Kevin Power Thanks for the link, I somehow got swapped back to the the April version. Current one is fixed.
Lloyd McKenzie (Aug 23 2018 at 21:09):
Why would you care about the readability of a region if you didn't look for anything? I agree it may mean a large number of Observations - the question is how to concisely express where you didn't find anything. The notion of "region studied" is used to infer where you didn't find anything - and FHIR queries can't do inference.
Lloyd McKenzie (Aug 23 2018 at 21:11):
@Kevin Power You should be able to say that with 3 Observations - one saying that you found something while looking at that area and two others for each of the found variants. The former would point to the latter.
Kevin Power (Aug 23 2018 at 21:14):
Hmmm, OK - I think that is basically what we were asking for? The former being our Region Studied , the later being Described Variants ? Maybe with the exception that I think you indicate the former should include a notion of whether something was found or not? And if something was found, it should include some reference to the Observations?
Lloyd McKenzie (Aug 23 2018 at 21:15):
Correct. The variants themselves would obviously describe region too - but narrower in scope to where they were actually located.
Kevin Power (Aug 23 2018 at 21:22):
Just for consistant discussion purposes, what would you call the 'former' observation (the one saying we found something while looking at that area)?
Lloyd McKenzie (Aug 23 2018 at 21:30):
Yes. The ones that say "I looked at this broad area and found something" would point to the specific things found (supporting the assertion that 'something' was found). Observations that say "I looked at this broad area and found nothing" wouldn't point at anything. However both would need to fully describe the region being looked at, the types of things being looked for and the methodology used to look.
Bob Dolin (Aug 23 2018 at 21:35):
keep in mind that quality/readability may be upstream of variant calling, and that in some cases (I think Bob M's HLA scenario), variants aren't called at all, in some cases variants are sought at a much later date, etc. While I understand linking region-studied to variants found, they are distinct observations. Think of regions-studied as useful metadata about a study.
Lloyd McKenzie (Aug 23 2018 at 22:30):
If all you're wanting to say is "Here's the sequence that I found", that's find as an Observation. However, if you're not reporting the sequence, then you need to report something.
Kevin Power (Aug 23 2018 at 22:43):
So @Lloyd McKenzie - Again, keeping with my simple example (sequencing a full gene) - We should make sure that if someone asks for Observations on that gene, the query might return one of the following:
- Nothing - meaning we have never tested that gene
- 1 Observation - 1 "Region Tested" Observation with a result value indicating the gene was tested but nothing was found
- 3 Observations - 1 "Region Tested" Observation with a result value indicating something was found that has links to the two variant Observations, and 2 "DescribedVariant" Observations
Lloyd McKenzie (Aug 23 2018 at 23:12):
Right
Kevin Power (Aug 24 2018 at 04:22):
So in that case we were close with our proposal. We just need to make sure and use the same Gene component and define a valueBoolean on our region profile. But while that would solve the “easier” case of Sequencing a full gene, we might have to think through what it means for other scenarios. Like WES test. Do we need 20k Observations, one for each gene? It also gets more complex when the “region” is not as easy to identify as a gene - like exons within a gene where numeric ranges within a sequence come into play. I am starting to question if we are getting enough value here to warrant the complexity.
Bob Dolin (Aug 24 2018 at 14:36):
callable_status.bed.txt Ultimately, I feel like we need something along the lines of the attached - it's an automated and annotated summary of regions studied, derived from the SAM/BAM file. I suppose in FHIR, each row of the file would need to turn into its own observation? Note though that the regions are not defined based on whether or not they contain variants - variant calling comes later.
Kevin Power (Aug 24 2018 at 15:06):
I am not sure we need that level of detail, but if we do, I feel like we are really stretching the boundaries on Observation.
Lloyd McKenzie (Aug 24 2018 at 15:28):
The base question is what do we want to be able to query? If we want to be able to ask the question "has anyone looked at region X and, if so, what was found?", then that level of information needs to be reflected in the Observations to allow the query to be performed. And the messy space very much is how to express "I ran a chip that looked for 20k variants and didn't find 19,997 of them, but did find 3" in a way that lets you search meaningfully on one of the variants that was checked for and definitively not detected. Capturing 19997 observations doesn't sound like a lot of fun, but if we don't want to do that, what do we do that will allow the types of search we want?
Lloyd McKenzie (Aug 24 2018 at 15:29):
(There's a reason there wasn't a solution to this problem in the original design :))
Kevin Power (Aug 24 2018 at 15:44):
I think more queries will be on the Interp/Impact side, but on the Findings side my sense is that the two most common queries will be (others will likely disagree and have other much more targetted use cases):
1 - Does this patient have variant X (looking up by an ID value or perhaps HGVS string).
2 - Does this patient have a finding in gene Y (looking up by HGNC).
In both cases, I think it is important that we are able to provide the information that we looked but didn't find anything, but I am honestly not sure how often that really makes a difference?
20k Observations does NOT sound like a good solution to that problem at all - hence my wondering if should move this to Sequence or some other resource? Perhaps we need something like GenomicsStudy, analogous to the ImagingStudy resource?
Jamie Jones (Aug 24 2018 at 15:47):
I could see the data in that bed file being implemented in one observation (or as a component/extension elsewhere) with 0..* "callable" ranges, 0..* "low coverage" ranges, 0..* "poor mapping quality" ranges, etc. We should be able to set up search methods to query those ranges by position to confirm a range was callable or not, but it isn't obvious to me how the ranges themselves pair up with gene/variant-level queries (if this step is easy then I think we're pretty close?)
Bob Dolin (Aug 24 2018 at 15:58):
stepping in to somewhat unfamiliar territory here, but I kinda recall that the HL7 V3 AnnotatedECG guide defined a very compact method for communicating the wave signals. Here, there's no doubt that folks need to know what probes are included in a chip, but that doesn't necessarily mean we need to communicate it via FHIR. But I'm wondering, are there / should there be ways of densely packing data in FHIR observations?
Kevin Power (Aug 24 2018 at 16:00):
Well @James Jones , 1 Observation with 20k component values seems a little less yucky, but still yucky and like we might be trying to fit a very large square peg in a little round hole.
Lloyd McKenzie (Aug 24 2018 at 16:35):
We can define "special" search criteria that understand how to work with complex externally-defined data structures, so actually embedding a .bed file or some equivalent in the Observation to describe where we looked and did/did not find anything. That would be less yucky than breaking everything out into components. Whether it's sufficiently less yucky, I don't know...
Kevin Power (Aug 24 2018 at 18:10):
I know @Bob Dolin is thinking through a particular use case, but I have to say I have yet to see any use cases that would require .bed file level data. I feel we have left the realm where FHIR should be used at all, or at a minimum we have hit something where an Observation just doesn't cut it. Could we make it work? Sure, Observation is a fairly generic resource that can fit a lot of use cases. Should we make it work? I don't think so.
I keep coming back to what I have seen on reports today. If our goal is to provide more discrete homes for the data currently locked away in those PDFS, I think our original proposal gets us fairly close. If doing it that way, or at least something close to that way, doesn't align with what an Observation should be, I propose we drop the topic and labs can continue to share that information in the reports as they do today.
Lloyd McKenzie (Aug 24 2018 at 18:21):
It's homes for the discrete data currently being shared, but it's also exposing that information in a way that enables subsequent querying, sharing and manipulation.
Kevin Power (Aug 24 2018 at 18:27):
Fair - And I guess I would say the original approach sufficiently enables that
Lloyd McKenzie (Aug 24 2018 at 19:03):
"Original approach" =?
Kevin Power (Aug 24 2018 at 19:52):
Sorry, the 'Region Studied' profile originally suggested by Bob D, with the tweaks of:
- Ensuring we use the same 'gene' component LOINC code, so that a query for a gene would return the region studied + any findings for that gene (I think maybe this is already the case , but we should verify)
- Defining the Region Studied to have a result value with a type of (boolean? Code?) to indicate something was found/nothing was found
Bob Dolin (Aug 24 2018 at 20:15):
Regarding result value to indicate something was found, 2 comments: [1] we need to be clear as to whether we're representing the intended vs. actual regions studied; [2] since variant analysis can be downstream, and because we may define region ranges based on other types of annotations, I'd prefer optional relationships between regions and found variants over an observation value.
Jamie Jones (Aug 24 2018 at 20:29):
If we are using it to hold metadata and pointers to variants found, I don't see why we wouldn't want this to be in the Genomic Panel profile (isn't it right in the name of the discussion group?) Otherwise, it seems like we're functionally reinventing the panel profile (which I'd also be fine with)
Bob Dolin (Aug 24 2018 at 20:40):
I had thought a "panel" was a grouper for a bunch of tests. On the other hand, here we're talking about, for instance, regions covered by a single test.
Kevin Power (Aug 24 2018 at 20:51):
There is overlap, but slightly different needs I think. But honestly, I would be OK with using Genomics Panel
Jamie Jones (Aug 24 2018 at 20:57):
My instinct here would be to take the Impact profiles out of Panel (just keep them in DiagnosticReport) and add this functionality in as a grouper for all the regions tested, by whichever method. Maybe "Panel" is a region-specific term here--we could use the glossary to clarify, or pick a different name... just trying to reduce the overlaps :)
Lloyd McKenzie (Aug 24 2018 at 22:27):
If "region studied" is easily identifiable, that's fine. But I'm not clear how that would handle the situations where you're doing a chip test that looks for a bunch of different variants (possibly across multiple chromosomes)
Kevin Power (Aug 24 2018 at 22:30):
One of the components was this (which could still ramp out of control on large tests):
component (gene mutations tested for) 36908-2 [0..] (coded, preferable HGVS)*
Kevin Power (Aug 24 2018 at 22:35):
To be clear, I completely recognize this won't solve all use cases. I hope it solves enough to make it somewhat useful, but more importantly to get people thinking about it and possibily garner feedback on better ways to do it.
Grahame Grieve (Aug 26 2018 at 21:52):
the HL7 V3 AnnotatedECG guide defined a very compact method for communicating the wave signals
Grahame Grieve (Aug 26 2018 at 21:53):
indeed. And in FHIR, that's the SampledData type. I wonder, though, the wisdom of this, given that it's hardly used and there are underlying formats for that data. The same principle applies here, as discussed: if there's an underlying format, why would it not be better to use that?
Larry Babb (Aug 27 2018 at 10:25):
@Kevin Power I’m jumping in late but just wanted to point out the most labs don’t report every variant found. They typically only report variants the assess to be clinically significant or uncertain significance. Benign or incidental variants are usually not reported unless they are novel and this may vary from lab to lab. You may also note that on larger panels, exonerated and genomes only variants in specific regions are assessed and reported which may or may not include incidental findings for a non requested indication.
Larry Babb (Aug 27 2018 at 10:28):
In regards to capturing general statements like genes studied... this may be understood but to clarify again , the entirety of a gene is not studied in most cases. The deep intronic regions are not always sequenced or called. I’m not sure if that’s important when we generalize tested regions with a set of hgnc ids.
Kevin Power (Aug 27 2018 at 13:28):
@Larry Babb - Both good points. Labs typically report a small subset of variants, but figuring this out would provide systems who do have all the variants a means to express what they know. Your second point is indicative of the complexity, but honestly I was OK making more general statements (gene level). If that is inappropriate or we have concerns someone might use it incorrectly, does that push us to the bed file level as Bob D suggested?
Lloyd McKenzie (Aug 27 2018 at 14:01):
While it's true that not all variants are reported, at the same time it generally is possible to make inferences based on what was tested and what was reported, what definitively does not exist. That's what we want to expose in a searchable way.
Bob Dolin (Aug 27 2018 at 14:42):
Not that I'm necessarily pushing for bed file, but a couple other considerations: [1] they are easy to obtain / generate (manufacturers of gene panels often include one to indicate intended coverage; they can be automatically generated from SAM/BAM; [2] they enable a more fine-grained approach to Lloyd's use case (i.e. given Larry's comments, what assumptions can you make if you're simply told "HLA-A gene was studied"); [3] fine-grained regions can be annotated for a variety of reasons, such as quality, such as whether variants were found, etc.
Lloyd McKenzie (Aug 27 2018 at 14:47):
The main downside of embedding a complex file and developing a custom search parameter based on it is that queries of "what's been tested and found to not exist" would only work with systems that supported those custom search parameters.
Bob Dolin (Aug 27 2018 at 14:50):
I agree. I was mainly using the bed file as an example. But if we did want that level of detail, are they ways of sending boatloads of compact data in FHIR?
Lloyd McKenzie (Aug 27 2018 at 14:53):
You can either store the file as a pure Binary and point to it or you can base64 encode it inside an Attachment data type
Kevin Power (Aug 28 2018 at 16:10):
To summarize, here are the three options I mentioned on the call today. I would like to choose one of these and move forward with a proposal by end of this week.
Kevin Power (Aug 28 2018 at 16:11):
1) Delay - Considering all the conversation around region targeted/studied, defining assays, CLIA approvals etc., perhaps we should not add to the IG right now and consider this more holistically in the future.
Kevin Power (Aug 28 2018 at 16:17):
2) Gene/Mutation level profile - Create a new profile of Observation that allows describing what was actually studied that looks something like this:
1. Create a new Observation profile, 'Region-Studied', hanging off GeneticsDiagnosticReport. [0..*]
2. Definition for new profile: "The Region-Studied profile is used to assert actual regions studied for the performed test(s). Intended coverage areas may differ from actual coverage areas (e.g. due to technical limitations during test performance).".
3. valueCodeableConcept (LOINC TBD) - Result value to indicate if anything was found
Two kinds of value sets we could consider:
3.1 Simple statement of "Was something found" (Positive, Negative, Unknown)
3.2 Clinical Significance "Was something clinically significant found" (Pathogenic, Benign, Likely Path, Likely Benign, Unknown)
4. Components
component (gene studied) 48018-6 [0..*] (HGNC symbol or NCBI gene code)
component (gene mutations tested for) 36908-2 [0..*] (coded, preferable HGVS)
component (description of ranges of DNA sequences examined) 81293-3 [0..1] (narrative)
component (region scope) TBD [0..1] (Whole Genome | Whole Exome | Gene Panel | Specific Variants)
component (gene coverage) TBD [0..1] numeric (when sequencing, what % of the gene was sequenced)
Kevin Power (Aug 28 2018 at 16:19):
3) Detailed coverage - Model the FHIR equivalent of the BED file that Bob D referenced earlier (that looks something like this):
20 0 60000 REF_N 20 60000 9999878 NO_COVERAGE 20 9999878 9999905 LOW_COVERAGE 20 9999905 9999920 CALLABLE 20 9999920 9999921 LOW_COVERAGE 20 9999921 9999925 CALLABLE 20 9999925 9999926 LOW_COVERAGE 20 9999926 10024329 CALLABLE 20 10024329 10024339 POOR_MAPPING_QUALITY 20 10024339 10024340 CALLABLE 20 10024340 10024345 POOR_MAPPING_QUALITY 20 10024345 10098323 CALLABLE
Gideon Giacomelli (Aug 29 2018 at 08:02):
Dear all, I am new in this group and want to shortty introduce myself. I'm Bioinformatician, Softwarearchitect and Project Manage. I worked for six years at the german cancer center in Heidelberg Germany and we managed Genome data for research projects and also some clinical studies where patients where discussed in a molecular tumorboard. Right now I am also part of a medicine informatic initiative http://highmed.org/ where we want to optimive the process of storing mutation data and generating a kind of report for a molecular tumorboard at the same time I am working for the Center of Digital Helath (Berlin Institute of Health / Charite in Berlin) where we want establish something similar and of course standardize our mutation results even further. We're eager to participate in global ongoing initiatives like GA4GH initiatives and HL7/FHIR.
Gideon Giacomelli (Aug 29 2018 at 08:16):
reading through the recent messages I would vote for solution 1) proposed by @Kevin Power . It won't be easy to specify which regions exactly were NOT studied. The name of a gene panel should somehow lead to the genes it is designed to but I don't think this is in the scope of FHIR right now. I like the suggestion from @Kevin Power of a resource "GenomicStudy" because as far as I understand is that we squeeze a lot into the Obeservation just to stay as generic as possible. There are so many complex mutations which needs to be handled: InDels (short deletion, insetsions), larger Deletions and Insertions, Amplifications, Translocations which might result in gene fusions, inversions..., but again I am new to the FHIR resources and still working on undestranding the relations between observations, sequence resource...
Dora Walter (Aug 29 2018 at 12:23):
“I think it is important that we are able to provide the information that we looked but didn't find anything, but I am honestly not sure how often that really makes a difference?” @ Kevin Power
Based on this question:
Use Case:
1) Service request from clinic to lab: for patient A - NGS with Panel-T01
2) Diagnostic report of Panel-T01 patient A: 1 variant in gene ERBB2 (Her2), 1 variant in gene CDK4
Now it is necessary to know, if there are any resistance mechanism for treatment known?
Yes - If there is a mutation in gene RB1 and a loss of function (Rb) the treatment with CDK4/6-Inhibitor (e.g. Palbociclib) will not work.
Therefore, as a physician I need the information: Gene RB1 was in the Panel-T01 and it is wildtype (means no variant found).
--> based on the molecular diagnostics a treatment with CD4/6-Inhibitor could theoretically work.
Kevin Power (Aug 29 2018 at 13:43):
Thanks @Dora Finkeisen for that use case. We have not spent much time talking about "wild type" results. But it sounds like your use case above is a great one. A few thoughts:
- I still think it is OK to separate the concepts of "region studied" and "what was found", though they relate. A region studied would be included, and as suggested by Lloyd we could include an indicator that nothing was found. However, we still want to say more about the fact that no variants were found.
- It seems we would want to use our Somatic Impact profiles to further describe treatments that will or will not work.
- Our DescribedVariant profile will have a valueCodeableConcept from this LOINC answer list. I suppose we could send a DescribedVariant observation with just the gene level information, and a valueCodeableConcept = Absent? This would then let the Somatic Impact connections be made as they are defined today.
I think we would separate the fact that I studied the gene, and the "region-studied" Observation can tell you about how I studied the gene. Then another Observation (perhaps DescribedVariant, perhaps something else) would say I have observed the fact that this gene matches the reference Then other observations to tell you about the impact this has.
Anyone think differently?
Kevin Power (Aug 29 2018 at 13:57):
Thanks @Gideon Giacomelli and welcome to the group!
Question for you - It won't be easy to specify which regions exactly were NOT studied
I don't think we are proposing to say was not not studied, but instead will start with a way to say what was studied, then a way to indicate what (if anything) was found in the genes that were studied.
Regarding GenomicsStudy, I do know that requesting new Resources in FHIR is something that needs to be well thought out. We will need to do some work to define why it is necessary and why other resources don't meet the need. I tend to lean towards needing something else in this space, but I am not quite ready to make the official request. Our group are doing some more detailed information modeling, and something might grow from that.
Lastly, regarding There are so many complex mutations which needs to be handled: InDels (short deletion, insetsions), larger Deletions and Insertions, Amplifications, Translocations which might result in gene fusions, inversions...
This is another area we are lacking in. We have defined here some starting points for describing changes, but there is no doubt we can't handle more complex changes. GA4GH has a work stream going around Variant Representation that we hope to learn from, but of course welcome feedback from anyone who has suggestions on the appropriate way representing those different types.
Patrick Werner (Aug 29 2018 at 15:47):
Great discussion so far, but the topic is getting mixed up a little, i think we should start new topics for the different discussions (e.g. region studied).
My focus was the DiagnosticReport and how to capture panels, requests for panels and the result of a panel in a clinical context.
After reading/hearing all the Input i just want to provide my understanding of it:
When ordering a panel i put the code for a panel in ServiceRequest.code.
ISSUES:
- ServiceRequest.code: how to capture panelCode -> list of gene codes to look at. This would be important to capture, preferably as a FHIR resource, but i can't find a fitting one.
- ServiceRequest.code has the cardinality: 0..1 - how do i order a "custom panel" -> a list of gene codes to look at if only one code is allowed.
For reporting wildtypes i complete agree with @Kevin Power
The lab returns all studied genes as Observations(Described Variant). I have one Observation per found variant or per gene. What was looked at is Observation(Described Variant).component:gene-studied, the overall result (was something found) would be Observation(Described Variant).valueCodeableConcept.
If a Variant was found the details are captured in the components.
So reporting alle studien genes and the results is possible with our profiles. Ordering a panel or a list of genes to be studied is currently not completely supported (see issues above)
Lloyd McKenzie (Aug 29 2018 at 15:54):
In the ServiceRequest you can use the orderDetail
element to refine the order (e.g. listing the specific genes of interest). In general when ordering a panel of tests though, every single test that could be resulted separately must be a separate ServiceRequest. They are linked by having a common requisition
identifier. For example, if you order a custom blood panel, every test is a separate ServiceRequest, even though they're all ordered together.
Patrick Werner (Aug 29 2018 at 16:11):
Thanks @Lloyd McKenzie for the explanation. So if i order a 700 gene panel i would place 700 ServiceRequests ignoring that these tests would be executed on one NGS chip? So i would never order a Panel ID - i always would place the seperated requests? Or do i misunderstand the meaning of
could be resulted separately ?
Kevin Power (Aug 29 2018 at 16:50):
Thanks @Patrick Werner for helping keep focus in these threads :thumbs_up:
And speaking of focus, I will have to admit, I wasn't even considering the "ordering" side of describing what regions (genes) are being requested. So far, this conversation has revolved around the description of "what did we test and analyze". I would suggest we delay further defining of the specific requests that can come from the ordering side?
Lloyd McKenzie (Aug 29 2018 at 16:53):
You could just order a test at the level of the panel without listing the genes at all. If you're only wanting a subset of the genes tested, whether you use different service requests or a single request with multiple orderDetails depends on where state lives. If some pieces can be resulted while others aren't (or some pieces cancelled while others aren't), you'll need separate serviceRequests. If it's one immutable order that either gets reported or doesn't, then you can have a single ServiceRequest with orderDetails listing the genes.
Kevin Power (Aug 29 2018 at 17:32):
@Patrick Werner - OK that we break the servicerequest topic into a different topic since most of the current topic has been focused on "what was tested" conversation (realizing that we really did completely hijack your topic)
Patrick Werner (Aug 30 2018 at 08:09):
Yes of course, i didn't wanted to complain - just trying to keep track of the different topics
Kevin Power (Aug 30 2018 at 13:05):
Thanks Patrick.
So, any other thoughts on the 3 proposals? Currently have one vote for Option 1.
As a reminder, here are the three options:
1) Delay - Considering all the conversation around region targeted/studied, defining assays, CLIA approvals etc., perhaps we should not add to the IG right now and consider this more holistically in the future.
2) Gene/Mutation level profile - Create a new profile of Observation that allows describing what was actually studied that looks something like this:
1. Create a new Observation profile, 'Region-Studied', hanging off GeneticsDiagnosticReport. [0..*]
2. Definition for new profile: "The Region-Studied profile is used to assert actual regions studied for the performed test(s). Intended coverage areas may differ from actual coverage areas (e.g. due to technical limitations during test performance).".
3. valueCodeableConcept (LOINC TBD) - Result value to indicate if anything was found
Two kinds of value sets we could consider:
3.1 Simple statement of "Was something found" (Positive, Negative, Unknown)
3.2 Clinical Significance "Was something clinically significant found" (Pathogenic, Benign, Likely Path, Likely Benign, Unknown)
4. Components
component (gene studied) 48018-6 [0..*] (HGNC symbol or NCBI gene code)
component (gene mutations tested for) 36908-2 [0..*] (coded, preferable HGVS)
component (description of ranges of DNA sequences examined) 81293-3 [0..1] (narrative)
component (region scope) TBD [0..1] (Whole Genome | Whole Exome | Gene Panel | Specific Variants)
component (gene coverage) TBD [0..1] numeric (when sequencing, what % of the gene was sequenced)
3) Detailed coverage - Model the FHIR equivalent of the BED file that Bob D referenced earlier (that looks something like this):
20 0 60000 REF_N
20 60000 9999878 NO_COVERAGE
20 9999878 9999905 LOW_COVERAGE
20 9999905 9999920 CALLABLE
20 9999920 9999921 LOW_COVERAGE
20 9999921 9999925 CALLABLE
20 9999925 9999926 LOW_COVERAGE
20 9999926 10024329 CALLABLE
20 10024329 10024339 POOR_MAPPING_QUALITY
20 10024339 10024340 CALLABLE
20 10024340 10024345 POOR_MAPPING_QUALITY
20 10024345 10098323 CALLABLE
Joel Schneider (Aug 30 2018 at 21:44):
Straw poll, using [1] [2] [3] emojis?
Kevin Power (Aug 30 2018 at 21:46):
You bet @Joel Schneider !
Bob Dolin (Aug 30 2018 at 21:49):
I'd be okay with option #2 for now. Couple minor comments re: option #2:
- I'd leave out the observation value from the panel observation. This way, we avoid the debate around value set. You'd simply have to look for variants in regions studied in order to know whether or not something was found.
- May need to tweak the cardinalities a bit (or add clarifying text) - for instance, gene studied is 0..* whereas gene coverage is 0..1.
Kevin Power (Aug 30 2018 at 21:52):
@Bob Dolin - I would say if we do the straw poll and decide on #2, we can still do some tweaks to the exact proposal. Of course, if we conclude on #3 we have a lot to figure out, and #1 there is nothing to figure out (for now :slight_smile:)
Kevin Power (Aug 30 2018 at 22:04):
For those that would like to participate in our straw poll, visit this, and you can "react" with a 1, 2, or 3 emjoi. You do that by clicking on that message, then click on the 1, 2, or 3 - easy!
Joel Schneider (Aug 30 2018 at 22:10):
Please subtract 1 from each of the totals -- I voted for all 3 (to make the icons visible).
Gideon Giacomelli (Aug 31 2018 at 08:01):
interesting thoughts. Maybe another question probably just for me: What would be the exact purpose of this Observation profile, 'Region-Studied'. Is this something the service provider (sequencing the panel) would have to provide? Thy would that be essential information? What information would the Region-Studied include which is not already in the Genetics Diagnostic Reports and in the other Observations the Report is already using?
Kevin Power (Aug 31 2018 at 13:12):
Is this something the service provider (sequencing the panel) would have to provide? Thy would that be essential
information?
Yes, the service provider would include 1 or more of the 'region-studied' Observations to describe what was sequenced.
What information would the Region-Studied include which is not already in the Genetics Diagnostic Reports and in the other Observations the Report is already using?
It is really a way to codify what was tested. Everything else we have done to date has been to describe what was found.
Patrick Werner (Aug 31 2018 at 13:21):
@Gideon Giacomelli this is something a sequencing provider can provide. Our provider doesn't report ranges, only names of the Genes which were looked at.
Kevin Power (Aug 31 2018 at 14:01):
As a reminder, we are taking a quick straw poll here to capture which proposal we should move forward with. I will likely want to try and finalize this by mid next week, so please click on [1], [2], or [3] emoji's reflecting your preference.
Last updated: Apr 12 2022 at 19:14 UTC