FHIR Chat · confidence-range extension · data extraction services

Stream: data extraction services

Topic: confidence-range extension

Philipp Daumke (Nov 23 2018 at 21:16):

A confidence-range extension could look like this:

"extension": [
    {
        "url": "http://fhir.de/StructureDefinition/confidence/0.5",
        "valueQuantity":  {
                "value": 0.9
         }
    }
]

Morten Ernebjerg (Nov 26 2018 at 11:46):

If I understand this correctly, the idea would be to represent confidence with a single number on a single, universal scale. My feeling is that this would be difficult without further context. Confidence in a prediction can be indicated in a wide range of ways e.g. p-values (for different test statistics and null hypotheses), likelihood ratios, the Bayesian or Akaike information criterion, or various types of confidence intervals (which again can be estimated in various ways). These differ e.g. in the range of possible values, whether bigger is better etc., and AFAIK there aren't standard ways of mapping them onto a fixed scale. Hence, it would be difficult to e.g. compare two different confidence-values, because they could describe very different forms of confidence mapped in non-standardized ways.

Based on that, I think it would be necessary to allow both a value (or range) and some sort of description of what that number represents, like value and unit go together in Quantity. It's not quite obvious to me how to best add this context. It may be possible to have a value-set for standard kinds of confidence-measures. However, given that people might invent their own measures of confidence for a particular software or contexts, I suspect that one would also have to allow some sort of "free-text" description and/or custom code system to at least allow humans to get an idea of what is meant.

Adding this context would of course make it harder to automatically process of the information, but I think that this only reflects the fact it would be difficult to confidently process the information without that context.

Simone Heckmann (Jan 07 2019 at 09:00):

@Morten Ernebjerg I agree, it will be difficult to find a universal measure. However, I would like to aim for at least a commonly agreed upon classification/categorization of confidence, e.g. a ValueSet with "high/medium/low" certainty. Just so that systems who do not care to further process or compare confidence values can at least visualize them for the user in a generic way.

It will be important for all systems to understand if a value is "uncertain" and make this obvious to the user, so I would like to see a mandatory field with a required binding to the ValueSet there...

If this minimal interoperable information is present, any additional "exact" values along with a description of the method that has been used could be optional and may have loose bindings or even only textual descriptions.

Simone Heckmann (Jan 07 2019 at 09:09):

Sort of like this:

<extension url="http://example.fhir.org/StructureDefinition/confidence">
    <extension url="confidence-class">
        <valueCode value="high" />
    </extension>
    <extension url="confidence-value">
        <valueDecimal value="0.01" />
    </extension>
    <extension url="confidence-method">
        <valueString value="p-value" />
    </extension>
</extension>

With the confidence-class being mandatory and the rest optional...

Morten Ernebjerg (Jan 07 2019 at 10:07):

@Simone Heckmann That sounds like a good approach to me (& frohes Neues BTW :) ). In this set-up, one could then also require that confidence-value cannot be used without confidence-method.

I guess a key piece here will be to find definitions of the various levels in the conficence-class ValueSet that are clear and can be also applied unambiguously and consistently across all use cases. I know that even for the old workhorse p-value, there are current discussions (triggered by slews or non-reproducible studies) over whether the traditional 0.05 threshold of significance is too lax and one should instead require e.g. 0.005 (see e.g. this comment, based on this paper). So even in that restricted context, it is contested what counts as "certain". But I guess it would still be useful to have a rough classification even if it contains a dash of subjectivity (I guess all clinical judgements do), and systems that are very particular about their standard can of course require more info and/or manual control.

Luca Toldo (Feb 21 2019 at 09:54):

I was wondering why not using as Extension the ISO GrAF structure.

It's use is nicely shown in the CDA context in the following paper https://www.sciencedirect.com/science/article/pii/S1532046411002085

or perhaps using the http://hl7.org/fhir/graphdefinition.html instead of GrAF ?

Simone Heckmann (Nov 27 2019 at 10:35):

I have created an Extension based on the results of the discussion so far:
https://simplifier.net/semantischeanalyse/confidence
It has

a confidence-classelement of type code that allows for the values high | medium | low as a common ground for all systems
a confidence-valueelement of type decimal for the calculated, exact value
a method element of type string to indicate the method used for estimating/calculation confidence (e.g. p-Value)

Comments?

Simone Heckmann (Nov 27 2019 at 10:37):

The Extention Context is Provenance.target, so if multiple target ressources are created from one common source (i.e. a Document), the extensions are specific to each target.

Morten Ernebjerg (Nov 28 2019 at 09:21):

Cool, looks good :tada: (with the side note that I am neither an AI nor NLP expert). Two comments:

How about making method a code and binding it to an extensible ValueSet so standard measures - if they exist - can be identified across systems? (I'm afraid I myself don't have a good overview of what should be in this list)
I think it would be worth considering adding the constraint that if confidence-value is used, then method must also be filled, as numbers without context would be hard to use

Simone Heckmann (Nov 28 2019 at 10:28):

Cool, looks good :tada: (with the side note that I am neither an AI nor NLP expert). Two comments:

How about making method a code and binding it to an extensible ValueSet so standard measures - if they exist - can be identified across systems? (I'm afraid I myself don't have a good overview of what should be in this list)

I think it would be worth considering adding the constraint that if confidence-value is used, then method must also be filled, as numbers without context would be hard to use

Agree to both. If we can come up with a list of at least three plausible example values for the method list, I'll be happy to update the extension accordingly :)

Thing is: if you use coded datatypes but don't offer at least an example binding, the QA process will throw it back in your face :) (not 100% sure that's true for extensions too, though)

Simone Heckmann (Nov 28 2019 at 10:29):

once we're happy with the draft, I'll add an CR into Jira to make it a core extension for Provenance and assign an hl7.org canonical url

Morten Ernebjerg (Dec 02 2019 at 08:56):

Alas, I'm out of my depth domain-wise when comes to a sensible starter list :disappointed: - we'd need a real expert. The best I can offer is to check with my machine learning PhD colleague.

Morten Ernebjerg (Dec 05 2019 at 10:35):

So, I discussed the question of codes for a method field with my friendly machine learning PhD colleague and his statement was that it's, well, difficult. To really be able to e.g. compare confidence-values coming from different sources, you would typically need a very precise description of how that number was derived, well beyond just saying "p-value". You would de facto_need a unique identifier for the specific algorithm including parameters, sample sizes etc. That makes it sound like making confidence values interoperable is hard, except in cases where the parties exchanging data already share knowledge of how the value was derived and have a shared code for it. So maybe staying with string for method and accepting that it will only be human-readable is more realistic.

Confidence level statement: The previous paragraph is based on statements from a non-randomly selected group of experts (sample size = 1) :grinning_face_with_smiling_eyes:

René Spronk (Dec 05 2019 at 10:46):

In the HL7 v2 space, in order to match patients based on their demographics, it's also a method/value pair. Different use case, I know, but it shows there's generally no agreement on a common methodology how to calculate confidence levels.

Last updated: Apr 12 2022 at 19:14 UTC

Main menu