Stream: terminology
Topic: matching shy of regex?
Josh Mandel (Mar 26 2022 at 16:11):
Has there been discussion about a filter operation that is more closely modeled on SQL LIKE
semantics? I.e. something more powerful than =
but less powerful than regex
, that supports "contains substring" or "starts with" queries without requiring full regular expression support?
This would be nice to have when searching designations, for servers that can't/won't support arbitrary regex evaluation at query time.
Michael Lawley (Mar 28 2022 at 21:23):
Yeah, I have thoughts like this every time I see someone propose "regex" as a matching algorithm.
One option today is to only support a subset of regex - e.g., only those that start/end with .*
Kai Kewley (Mar 29 2022 at 15:08):
An example to consider are the term filters in the SNOMED CT "ECL" language (introduced in ECL version 1.5). They allow searching concept terms using multi-prefix any-order matching, or wildcard matching. These two strategies seem to have a good balance between covering many use cases and not being too complex to use and implement.
See ECL Description Term Filter from about the third paragraph "By default, term filters match using..."
Josh Mandel (Mar 29 2022 at 15:36):
Shall I propose a filter operator called "like"? Or is server-specific restrictions on regex "good enough"?
Michael Lawley (Mar 29 2022 at 21:00):
A conservative approach would be to provide advice to implementers on recognizing prefix/suffix matches in a regex and otherwise returning a TooCostly error.
However, for authoring ValueSets it would probably be nicer to give people comfort that there's a "safe" subset of regex that is more widely implemented. The question is what that subset should be?
Josh Mandel (Mar 29 2022 at 21:26):
"Contains substring" or "Has prefix" (case-insensitive) are the two main capabilities I'd suggest here (both implementable on top of sqlLIKE
)
Last updated: Apr 12 2022 at 19:14 UTC