FHIR Chat · Search for Decimals respecting Ranges · implementers

Stream: implementers

Topic: Search for Decimals respecting Ranges


view this post on Zulip Alexander Kiel (Feb 13 2020 at 14:22):

According to the FHIR search specification searching for decimal values always involves ranges:

Searches are always performed on values that are implicitly or explicitly a range.

However under number in the table for the example [parameter]=gt100 it says:

Values that are greater than exactly 100

and further down:

When a comparison prefix in the set lgt, lt, ge, le, sa & eb is provided, the implicit precision of the number is ignored, and they are treated as if they have arbitrarily high precision

I have drawn an figure which shows how I interpret the spec:

gt.png

In the figure above values spread along the Y-axis starting at low values on top of the figure and increase to the bottom of the figure. The search value is red and has a thickness which corresponds to their precision. For example if the search value is 2.0 its lower boundary (LwBd) would be 1.95 and it's upper boundary (UpBd) would be 2.05. In green there is the "range above the value" which reaches from the exact search value 2.00000 up to infinity. On the right there is one possible matching target value in black. Its precision is lower so it stretches more along the Y-axis.

So instead of saying

Values that are greater than exactly 100

I would say that the upper boundary of the target value has to be greater than or equal the exact search value.

One example:

The search value is 2.2 and the target value is 2 which has a range of [1.5,2.5). According to my interpretation the target value 2 would match a search of gt2.2 but according to the simple rule from the table in the numbers section, it would not.

@Grahame Grieve @Lee Surprenant

view this post on Zulip Lee Surprenant (Feb 13 2020 at 14:46):

I had the same confusion at one point and I think the problem is this slightly ambiguous phrase: "Searches are always performed on values that are implicitly or explicitly a range."
But if you handle both the query value AND the target value as ranges, you get really odd results (like gt2.2 returning values of 2). Instead, our decision was to handle only the query value as an implicit range (and not the extracted numerical values). IMHO this is the interpretation that yields the most sane results.

view this post on Zulip Lee Surprenant (Feb 13 2020 at 14:49):

I remember asking @John Stairs a similar question at devdays last summer and I think they had a similar interpretation

view this post on Zulip Alexander Kiel (Feb 13 2020 at 15:06):

@Lee Surprenant If you view a value of 2 as the result of a measurement of something, all you can say is, that the real value you measured is in the interval of [1.5, 2.5). So it is quite possible that the real value is more like 2.3. If you then ask for all values greater than 2.2, you have to return the 2 because it is possible that its really 2.3. Doing so would also satisfy the rule to return more results rather than returning less.

view this post on Zulip Grahame Grieve (Feb 13 2020 at 19:40):

Searches are always performed on values that are implicitly or explicitly a range.

This is a general statement that is always true: things are always a range.

view this post on Zulip Grahame Grieve (Feb 13 2020 at 19:42):

When a comparison prefix in the set lgt, lt, ge, le, sa & eb is provided, the implicit precision of the number is ignored, and they are treated as if they have arbitrarily high precision

We say this specifically to deal with the question you raise, and it's quite specific and meaningful. The values are all still ranges, note, but we specified how it is to be interpreted.

It was quite the argument, btw, with the mathematicians agreeing with you (and also my mathematical half) but the engineers said that 2 is not in > 2.2. It becomes particularly relevant with dates rather than numbers, btw. So we specifically fixed this rule in place

view this post on Zulip Lee Surprenant (Feb 14 2020 at 15:49):

I still think there is some ambiguity in this part of the spec. I think it stems from the fact that there are two different "values" involved in each search. The value passed in the search query (what I call the query value for clarity) and the target value (from the resource being searched).

Searches are always performed on values that are implicitly or explicitly a range.

I [apparently correctly] understood this as a general statement and perhaps the reasoning behind the specified behavior where a query value of 2 should actually be interpreted as a search for a value in the range [1.5,2.5).
But originally I shared Alexander's interpretation that both the query value AND target value must always be handled as ranges. But then when would these search prefixes EVER have the non-range interpretations listed at https://www.hl7.org/fhir/search.html#prefix ?

When a comparison prefix in the set lgt, lt, ge, le, sa & eb is provided, the implicit precision of the number is ignored, and they are treated as if they have arbitrarily high precision

Is this a statement about the query value or the target value? My original interpretation was that this would only apply to the query value and so a search of gt2.2 would return a resource with a target value of 2--which has implicit range [1.5,2.5).
However, for some reason I changed my interpretation of this in the past (and I can't seem to find why now)...maybe it was the result of this previous conversation on the topic?

view this post on Zulip Grahame Grieve (Feb 15 2020 at 05:02):

possibly.

view this post on Zulip Alexander Kiel (Feb 17 2020 at 16:17):

@Grahame Grieve I opened an issue to further investigate this: https://jira.hl7.org/browse/FHIR-26311

view this post on Zulip Lee Surprenant (Feb 19 2020 at 17:56):

I tried an experiment on a couple publicly available test servers and I think it proves the point.
First, I posted a simple Observation that has a body temperature value of 100 F

"valueQuantity": {
          "value": 100,
          "unit": "F",
          "system": "http://unitsofmeasure.org",
          "code": "[degF]"
        }

On test.fhir.org, a search with value-quantity=gt100|http://unitsofmeasure.org|[degF] DOES return the resource.
On hapi.fhir.org, it does not.

I tried similar searches on vonk.fire.ly and wildfhir4.aegis.net but I couldn't get those ones to produce reasonable results at all.

view this post on Zulip Lee Surprenant (Feb 19 2020 at 17:58):

On test.fhir.org, gt100.49 also returns the resource, whereas gt100.5 does not...which is consistent with the range interpretation

view this post on Zulip Paul Church (Feb 19 2020 at 18:54):

+1 to the section in J#26311 that we need conformance tests for this or the implementations will never be consistent!

(Google does not return the resource.)

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:15):

I went and re-read the specification carefully on this.

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:15):

the date search is quite specific about this, with examples

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:16):

[parameter]=lt2013-01-14T10:00
2013-01-14 matches, because it includes the part of 14-Jan 2013 before 10am
[parameter]=gt2013-01-14T10:00
2013-01-14 matches, because it includes the part of 14-Jan 2013 after 10am

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:16):

the equivalent with numbers is :

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:16):

so a search of gt2.2 would return a resource with a target value of 2--which has implicit range [1.5,2.5).

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:16):

glad my server behaves this way ;-)

view this post on Zulip Grahame Grieve (Feb 20 2020 at 00:18):

we do not have a test suite for servers - we've never figured out how to make tests that are both robust and useful (it seems to be an either/or thing)

view this post on Zulip Mirjam Baltus (Feb 20 2020 at 15:49):

@Grahame Grieve : For numbers, it also says: [parameter]=gt100 Values that are greater than exactly 100
I interpret that to mean that a search like Lee's would not include the resource where the value is 100. If that is not the case, imho the specs should include some examples there to make it more clear.
@Lee Surprenant : Can you explain what you mean by saying that the Vonk server does not produce reasonable results at all? I can get the resource out just fine when I change 'gt' to 'ge', or when I remove the prefix.

view this post on Zulip Lee Surprenant (Feb 20 2020 at 16:01):

sure, i sent some private messages to Christiaan in case something wasn't working quite right, but happy to share my experience here (or anywhere else as well)...is there a vonk-specific stream?

view this post on Zulip Mirjam Baltus (Feb 20 2020 at 16:07):

No, there's no Vonk stream. But you can send your results to vonk@fire.ly so they will reach a larger part of the team, if you want.

view this post on Zulip Lee Surprenant (Feb 20 2020 at 16:08):

for me, the create worked fine in Vonk. then gt100 didn't find it and I thought maybe it was just behaving like HAPI. But then I tried this and still 0 results

curl 'https://vonk.fire.ly:443/Observation?code=http://loinc.org|8310-5&value-quantity=gt99|http://unitsofmeasure.org|[degF]'

And, what is really odd to me, is that this one does return it:

curl 'https://vonk.fire.ly:443/Observation?code=http://loinc.org|8310-5&value-quantity=gt1|http://unitsofmeasure.org|[degF]'

view this post on Zulip Mirjam Baltus (Feb 20 2020 at 16:16):

Yes, those results are unexpected! I did not try different values, so it seemed to work correctly. We'll look into this, thank you for pointing this out!

view this post on Zulip Lee Surprenant (Feb 20 2020 at 16:34):

if it was working as intended, would it honor the range semantics outlined above by graham? i'm thinking to update my implementation to follow the range interpretation as well...

view this post on Zulip Mirjam Baltus (Feb 20 2020 at 16:44):

At the moment I think it does not honor ranges - at least not for the gt100 search, since we probably looked at the table about prefixes on numbers. But we want to follow the specs, so will have a discussion about that in the team. As was mentioned by others, a test set would be really helpful here.

view this post on Zulip Grahame Grieve (Feb 20 2020 at 20:50):

@Mirjam Baltus

For numbers, it also says: [parameter]=gt100 Values that are greater than exactly 100

We could certainly clarify that, but the numbers are greater than exactly 100 if their range of possible values includes values greater than exactly 100

view this post on Zulip Craig McClendon (Apr 30 2021 at 00:08):

If I read this thread correctly, people seem to be indicating that the implicit ranges should apply to both the resource value and the search parameter.
So if the resource value is 2. A search of gt2.2 would return it.

But there's another side to that. A search of eq2.0 now will not return the resource with a value of '2'.

Implicit precision of resource value 2: [1.5,2.5)
Implicit precision of search param 2.0: [1.95,2.05)

The range of the search param value is narrower than the range of the resource value, and therefore the range of the search value fully contains the range of the target value is false.
Is that the intent, or am I misunderstanding something?

view this post on Zulip Lloyd McKenzie (Apr 30 2021 at 00:17):

@Grahame Grieve

view this post on Zulip Craig McClendon (May 02 2021 at 21:59):

I updated the Jira issue with some thoughts:
https://jira.hl7.org/browse/FHIR-26311
I will repeat them here in the interest of wider visibility and discussion:

So there are 3 possible interpretations for number searches, I believe:

1) apply implicit precision to both the resource value and the search parameter value.
This gives you incorrect evaluations such as :
2 = 2.0 -> false [resource: 2, searchparam eq2.0]

Implicit precision of resource value 2: [1.5,2.5)
Implicit precision of search param 2.0: [1.95,2.05)
The range of the search param does not fully contain the range of the target value, so 2.0 != 2.
That can't be right.

2) apply the implicit precision to the resource value only.
The precision is controlled then by the resource author and clients have no control when searching to narrow results.
With this you still have some funny cases - and this is hard to implement because you have to index 3 values - the actual value for lt/gt searches, and the implicit boundaries for eq searches.
With this and (1) you could not make compliant/performant FHIR APIs on top of existing data stores which would not have the pre-calculated implicit bounds stored.

3) apply the implicit precision to the search param only
This allows clients to control the precision. They can tighten the precision to narrow the results returned to a smaller set if desired. It's easier to implement and it allows compliant FHIR APIs on top of non-FHIR-native datastores.

It seems to me that (3) is the only interpretation that makes sense.

I'd be interested to hear other thoughts, especially if I've misunderstood something.

view this post on Zulip Lloyd McKenzie (May 02 2021 at 22:30):

@Grahame Grieve

view this post on Zulip Grahame Grieve (May 04 2021 at 04:43):

well, firstly, it's better to find than to not find.

view this post on Zulip Grahame Grieve (May 04 2021 at 04:51):

The range of the search param value is narrower than the range of the resource value, and therefore the range of the search value fully contains the range of the target value is false.

Yes I think you are.

Implicit precision of resource value 2: [1.5,2.5)
Implicit precision of search param 2.0: [1.95,2.05)

So the resource value implicit range (target) fully contains the implicit range of the search vlaue

view this post on Zulip Craig McClendon (May 04 2021 at 13:26):

The search spec for eq says the range of the search value fully contains the range of the target value. I think you have it flipped. But even so the problem remains. Either you search for eq2.0 and don't find 2, or you search for eq2 and don't find 2.0, depending on which range you put on the outside. And this case works for any number - n.0 != n, n.00 != n.0, etc. If the goal is to find more stuff applying implicit precision to both sides fails the test in a dangerous way. We're not just returning more data that may or may not be relevant to the searcher - we're holding back data that most definitely is.

And my other point that implicit precision cannot be applied on the resource side to non-native FHIR systems I think is valid as well. If I'm adding a FHIR API on top of an existing system, how would I apply implicit precision to the existing values? It's not practical to include a function in the search query as it would result in a table scan. You could not create a compliant FHIR API without first altering the existing data to pre-calculate the implicit precision.

edit to add: applying implicit precision to only the search parameter still accomplishes the goal of returning more stuff. You take the value the searcher supplied, and you widen it a bit with implicit precision to return things that are close to that value. It does not have any cases where it doesn't return something relevant because the outside range (the range of the search parameter) is the only range that is being widened. I think the spec needs to be clarified that this is how it should be done - as many systems have done it the other way.

view this post on Zulip Craig McClendon (May 05 2021 at 15:26):

I need a sanity check here.
To put it in a way that's easier (for me) to think about.
There are two ranges involved in a numeric equals search - the search parameter range and the target/resource value range.
The spec says: the range of the search value fully contains the range of the target value

In other words, the search parameter range boundaries must be outside of (or equal to) the target value range boundaries for the values to be considered equal.

So if the above is true, then:
1) If you widen the search param range (outside range), more values will potentially be within it and match.
2) If you widen the target ranges (inside range), fewer values will potentially match.
3) If you widen them both equally, it should cancel out

But if you widen both ranges with implicit precision, the number of decimal places in the values determine how much you widen each by. In cases where a target/resource value has fewer decimal places than the search parameter, the target range is widened more than the search range is and it then fails to match values which are clearly equal.

@Lee Surprenant , you were looking closely at this at one point. @Grahame Grieve ? Thoughts?
I'm failing to see a flaw in my logic.

view this post on Zulip ryan moehrke (May 05 2021 at 20:18):

that logic looks correct to me, I think the issue comes into play when we are adding both search range and precision requirements into the same medium of the search value's precision. You could only want values with as much precision as 2.0 (where eq2.0 not returning 2 is a good thing) but also where you don't necessarily care about the precision of the resource's values (say your legacy data example) but you only want values from 1.95 to 2.05 where you may want 2 to be returned on an eq2. I guess the argument is you could always do a ge1.95&le2.05 search if you wanted to include 2 without precision but that seems rather un-intuitive (but I've already run into situations where I've called these prefix'd searches un-intuitive but haven't won those arguments either)

view this post on Zulip Vassil Peytchev (May 05 2021 at 20:47):

You could only want values with as much precision as 2.0 (where eq2.0 not returning 2 is a good thing)

In my opinion, eq2.0 not returning 2 is never a good thing. I followed this thread, and previous threads on decimal comparisons, and I only see the potential of patient harm.

To me this is a clear violation of the 80/20 rule - you want eq2.0 to return 2 at least 99% of the time, as this is basic arithmetic. Applying set comparisons as the default behavior is mind boggling...

view this post on Zulip Lee Surprenant (May 06 2021 at 14:01):

@Lee Surprenant , you were looking closely at this at one point.

yes, although I don't have much to add beyond what I said above in this thread. originally we interpretted this section the way you are thinking.
our justification for this interpretation was that this interpretation yeilds the most intuitive results.

however after the discussion and investigation mentioned above, we changed to the range interpretation in which we consider both the query value and the target value to be ranges. if you consider a measured value of 2 to be the range [1.5, 2.5), then this could indeed fall outside of the range [1.95, 2.05). For semantics of "any overlapping" range, you can us the ap prefix.


Last updated: Apr 12 2022 at 19:14 UTC