FHIR Chat · is search on token case sensitive · implementers

Stream: implementers

Topic: is search on token case sensitive


view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:19):

I just updated the draft of R4 to say that token searches are case sensitive unless they are not.

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:20):

actually, it says, 'unless the underlying semantics say they are not'. That's... tough. really tough from the server side - you get a search token that matches your index, and the match may or may not be case sensitive, depending on the item.

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:23):

is it just me who thinks that this is a hard call for the server? @James Agnew @Ewout Kramer @Christiaan Knaap @Danielle Friend @Jenni Syed @Peter Jordan @Michael Lawley @Rik Smithies @nicola (RIO/SS) @John Gresh @Bryn Rhodes (semi-random selection of server maintainers - apologies if I missed anyone)

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:24):

Given how broad a "token" is, yes :)

view this post on Zulip Rik Smithies (Dec 15 2017 at 20:25):

it's a hard call to understand what it means

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:27):

Let's just take 1 fun example: email. Technically, domain is supposed to be case insensitive, but username is supposed to be sensitive. Your milage may vary when you try to see how this behaves in real life.

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:27):

and that's ignoring FHIR :)

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:27):

(it's a token search in FHIR, which is why I bring that one up)

view this post on Zulip Rik Smithies (Dec 15 2017 at 20:28):

ah so I see. I might have said "inherent" semantics

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:30):

I think in our server, most code searches that are token are case sensitive, but I agree that it would be really hard to say that sensitivity is Yes or No throughout

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:31):

I think it would be better to say that except for a few special parameters, token search parameters are always case insensitive

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:31):

or, even with the email example, implemented consistently and correctly :)

view this post on Zulip Rik Smithies (Dec 15 2017 at 20:31):

how is the server to know the underlying rules? They would need to be brought into FHIR on a per field basis. In fact you cannot even know if a (future) code system is case sensitive.

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:33):

yes that's part of the challenge. And then a search like this:

Get Observation?code=Xxx

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:33):

That's surprising for me (the insensitivity) b/c we usually say code is case sensitive

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:33):

I would expect it to follow the underlying rules

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:33):

that's case sensitive or not depending on the system of the coding.

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:34):

Jenni, are v2 tables case sensitive?

view this post on Zulip Rik Smithies (Dec 15 2017 at 20:34):

codes are normally case sensitive, but things like country codes may not be

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:34):

IANAV2E (I am not a V2 Expert)... not sure

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:35):

in fact, you rarely know whether codes really are case sensitive or not. Postel's law should apply here, and I'm trying to find a way to approach that

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:35):

.. the answer is.. not everyone agrees ... but the FHIR code systems claim that v2 tables are not case sensitive.

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:35):

All FHIR defined codes are case sensitive, right?

view this post on Zulip Jenni Syed (Dec 15 2017 at 20:36):

eg: gender, status fields, etc

view this post on Zulip Grahame Grieve (Dec 15 2017 at 20:36):

yes, except that the build tool won't let you define codes that only differ by case

view this post on Zulip Peter Jordan (Dec 15 2017 at 21:06):

It's definitely a tough call! Perhaps it should depend on the underlying data type of the token; for example, enforcing case-sensitivity on a string search might be problematic and I've always been forgiving on the casing of "true" and "false" both in and out of FHIR.

view this post on Zulip James Agnew (Dec 15 2017 at 21:31):

This is an interesting question. HAPI JPA/Smile certainly always makes tokens case sensitive currently, and I've never really thought about changing that. But now that you bring it up, Jenni's email example makes sense.

For a general purpose server I don't know that it makes sense to leave this purely up to server implementors. A standard extension on SearchParameter seems like it might be useful to allow people to configure this at the SP level.

view this post on Zulip John Moehrke (Dec 16 2017 at 16:52):

case insensitivity is evil. I would be happy with it being case sensitive always. which specific things would break, can we make a different search type for them (like go use string)?

view this post on Zulip Kevin Mayfield (Dec 16 2017 at 17:19):

UK (NHS) HAPI Restful server is case sensitive. (http://yellow.testlab.nhs.uk/careconnect-ri/ )

view this post on Zulip Kevin Mayfield (Dec 16 2017 at 17:20):

because of the codes

view this post on Zulip Pascal Pfiffner (Dec 16 2017 at 21:06):

Actually, @John Moehrke , I believe case sensitivity is evil when it comes to strings. Humans think case insensitive, so there must always be strong justification as to why something is case sensitive. Base64 hase a good justification, but coding system codes do not IMHO. Yes, they should be consistently cased, but if you can't have codes that only differ in case (which I believe to be the right thing), then you have conceded that the codes are not truly case sensitive. Unless there are legit uses for codes only differing in case (I have luckily not encountered such a system), is there a reason for search to be case sensitive? Domains/email and UUID strings are case insensitive for good reasons. I do not believe we'd follow Postel's law if code/token searches are case sensitive.

view this post on Zulip John Moehrke (Dec 16 2017 at 21:07):

I would agree if humans in all languages considered case insenstivity the same... they don't...

view this post on Zulip Pascal Pfiffner (Dec 16 2017 at 21:10):

Not even when it comes to the meaning of names (and I think of codes as names)?

view this post on Zulip Grahame Grieve (Dec 16 2017 at 22:39):

see email earlier in this thread, though that's weird outlier. UCUM is case sensitive. But not a high priority for search. v2 tables have caused trouble in production in the past.

view this post on Zulip Richard Townley-O'Neill (Dec 18 2017 at 00:11):

Case sensitive search is a useful option for searching for names in strings, "White" as a name and "white" as a colour.

view this post on Zulip Grahame Grieve (Dec 18 2017 at 00:48):

do read the comments about case sensitivity for string parameters

view this post on Zulip Christiaan Knaap (Dec 18 2017 at 15:26):

I can imagine that a facade implementer knows all the codesystems used in his/her system and can therefore decide whether a token search should be case sensitive or not.
For a general purpose FHIR server this is much less the case. Unless it knows the case sensitivity of all used code systems. And then still:
In one valueset there can be case sensitive and case insensitive codesystems mixed.
What do you mean by doing a string search instead? Defining a String-type searchparameter on the element?
In Vonk we currently always match tokens case insensitive, by the concept that it is usually safer to match a bit broader than narrower, and the expectation that .
I agree with @James Agnew to allow people to configure this in a SearchParameter, or maybe as a modifier on the search url.

view this post on Zulip Elliot Silver (Dec 18 2017 at 23:26):

Are there cases where different repetitions of an element contain values from different code systems -- some of which are case-sensitive, others are insensitive?


Last updated: Apr 12 2022 at 19:14 UTC