Stream: fhirpath
Topic: Matches question
Grahame Grieve (Mar 10 2022 at 00:07):
According to the spec, matches():
Returns true when the value matches the given regular expression. Regular expressions should function consistently, regardless of any culture- and locale-specific settings in the environment, should be case-sensitive, use 'single line' mode and allow Unicode characters.
Grahame Grieve (Mar 10 2022 at 00:08):
so, should this test pass:
Grahame Grieve (Mar 10 2022 at 00:08):
'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('Library')
Grahame Grieve (Mar 10 2022 at 00:12):
my take is that this should not pass. @Bryn Rhodes @Ewout Kramer
because it doesn't pass i java or dotnet: both of them treat matches as a full match, not a partial match
Paul Lynch (Mar 10 2022 at 00:31):
If you want it to match the full string, you could use the beginning/end of string markers:
'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('^Library$')
which would be false.
Without those, I think it should be true. Otherwise, how would you test whether an expression matches part of a string?
Chris Moesel (Mar 10 2022 at 02:46):
JavaScript String.match(regex) does not require a full-string match. It works like @Paul Lynch suggests (matches any region; use ^
/ $
to force full-string matches).
Chris Moesel (Mar 10 2022 at 02:51):
That said, I think I recall discussing CQL matches
behavior w/ @Bryn Rhodes and he indicated that it should be full-string matching for CQL. Assuming I remember that correctly, I expect that's also the intent for FHIRPath.
Brian Postlethwaite (Mar 10 2022 at 08:29):
I'm with Paul on this one. We've been explicit with the other parameters for the regex engine.
(not sure why we'd have said single line too, means can't process narratives, or longer text content)
Grahame Grieve (Mar 10 2022 at 19:08):
Otherwise, how would you test whether an expression matches part of a string?
'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('.*Library.*')
Grahame Grieve (Mar 10 2022 at 19:10):
for me, a full string match is the intent. The spec says "when the value matches the given regular expression" not "when the value has content that matches the given regular expression"
This is a subtle difference. But this certainly needs clarity
Paul Lynch (Mar 10 2022 at 19:51):
When languages (e.g. JavaScript, Ruby, Perl) provide the ability to match a string against a regular expression, it is the regular expression that controls whether it is against the full string or not. I think it would be very unexpected to provide a "match" API that takes a regular expression but always requires it to match against the full string.
Grahame Grieve (Mar 10 2022 at 20:00):
given that's what Java does, it's not going to be that unexpected
Lloyd McKenzie (Mar 10 2022 at 20:03):
It's also what XML schema does
Paul Lynch (Mar 10 2022 at 20:19):
Grahame Grieve said:
given that's what Java does, it's not going to be that unexpected
I guess it has been too many years since I did Java development. I had to go test that, but you are right.
Paul Lynch (Mar 10 2022 at 20:26):
Java has both matches() and find() (in Matcher) and find() is the one that will look for a match in a substring of the string. It sounds like if FHIRPath matches() is supposed to be like Java matches(), then find() should be added as well.
Paul Lynch (Mar 10 2022 at 20:27):
I don't understand why Java has both APIs though, when the regular expression can control that.
Gino Canessa (Mar 10 2022 at 20:29):
A tool I frequently use is https://regexr.com/ (though I tested against several just to be sure - feel free to use whichever tester you would like).
If you enter the text: http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1
, you can see the difference in the evaluations:
RegEx | IsMatch | Notes |
---|---|---|
Library |
true | matches the literal 'Library' in the string |
.*Library.* |
true | matches the entire string |
^Library$ |
false | asking for exact literal |
Given that every 'tester' I used is consistent, and assuming the intention is that FHIRPath is consistent with regex, I would apply those behaviors.
edit: actually, at https://regex101.com/ it can perform the evaluation in a lot of contexts (PHP versions, JS, Java, .Net). They are all consistent with the above.
Grahame Grieve (Mar 10 2022 at 20:46):
well, I use that too. But it's testing regex matching, not the behaviour of the match() function, which is where the actual question arises.
Grahame Grieve (Mar 10 2022 at 20:46):
either way around works - you just have to prefix/suffix for the other case (either ^ & $ or .*)
Gino Canessa (Mar 10 2022 at 20:49):
Sure. My point is that matching against Library
, .*Library.*
, and ^Library$
are different things in RegEx. FHIRPath can define any behavior it wants, but if it differs from the behavior of regex there, it will cause confusion down the road.
edit for clarity
Grahame Grieve (Mar 10 2022 at 20:49):
don't understand why Java has both APIs though, when the regular expression can control that.
a mature API. If you have contexts in which the regular expression is written for one approach and not the other, then massaging it from one to the other is yucky
Grahame Grieve (Mar 10 2022 at 20:50):
if FHIRPath matches() is supposed to be like Java matches(), then find() should be added as well
I think that's the right thing for us to do here
Grahame Grieve (Mar 10 2022 at 21:02):
it differs from the behavior of regex there
my point is that this is orthogonal. The question isn't regex, the question is the meaning of the match() function. And we're screwed one way or the other because java does it one way and javascript does it another
Gino Canessa (Mar 10 2022 at 21:15):
Yep, I get that we are discussing the behavior of a FHIRPath function. It could be defined to do literally anything and would be correct by definition =).
But this is also relevant for the sibling replaceMatches
function, which I assume should use the same semantics as matches
and depends on what parts of the string are actually matched.
Chris Moesel (Mar 10 2022 at 22:28):
Ha. That's a good point. replaceMatches
wouldn't work very well if it only did full-string matching, now would it?
Bryn Rhodes (Mar 10 2022 at 22:29):
I agree this test should not pass, but also agree that the specification needs some clarity here. I have confirmed that the CQL engine behaves this way (fails the test), and that both the FHIRPath and CQL descriptions of the matches (and replaceMatches) functions are identical. Given that the engines are confirmed to behave this way I think this can be a technical correction to clarify the expected semantics here on FHIRPath and CQL.
Gino Canessa (Mar 10 2022 at 22:36):
Bryn, how can you replace just 'Library' in http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1
via replaceMatches
if that test fails? (assuming normal regex processing that replace .*Library.*
would replace the whole string)
Bryn Rhodes (Mar 10 2022 at 22:54):
Hmm.... fair point, so basically the way it works now we can't hit that simple use case :(
Bryn Rhodes (Mar 10 2022 at 22:55):
So as Grahame is suggesting, we'd have to introduce a findMatches() and change replaceMatches to use find semantics.
Gino Canessa (Mar 10 2022 at 23:01):
I am not familiar enough with FHIRpath implementations to weigh in on that part. I do not see any examples of match
in the spec, but I do see something under replaceMatches... which uses the replace
function =).
Assuming that we swap out replace
with repalceMatches
, the example would indicate that the match semantics would need the be the 'typical regex' version to work.
I also discovered the note in that section that the implementation should generally align with PCRE, in which the original test matches successfully.
Bryn Rhodes (Mar 10 2022 at 23:05):
Oh what a tangled web we weave.
Bryn Rhodes (Mar 10 2022 at 23:06):
Are you saying that PCRE semantics implies the test should pass?
Gino Canessa (Mar 10 2022 at 23:10):
As far as I can tell, yes. https://regex101.com/ has the option for testing against PCRE and it passes there.
Chris Moesel (Mar 10 2022 at 23:18):
Note that FHIRPath's replaceMatches says:
Matches the input using the regular expression in regex and replaces each match with the substitution string.
I think the phrase "replaces each match" implies that the pattern can match on sequences within the input string (not just the whole string itself).
Chris Moesel (Mar 10 2022 at 23:21):
@Bryn Rhodes -- As you know, CQL has matches, replaceMatches, and splitOnMatches.
Whether matches
is full-string or not is ambiguous. But replaceMatches
contains a description similar to what I pointed out above (implying matches against substrings) and an example that demonstrates this:
define "ReplaceMatchesFound": ReplaceMatches('ABCDE', 'C', 'XYZ') // 'ABXYZDE'
And of course SplitOnMatches
would just be silly if the pattern was required to match the whole string.
Bryn Rhodes (Mar 10 2022 at 23:40):
but I do see something under replaceMatches... which uses the replace function =).
Ha, that one is definitely a bug.
Bryn Rhodes (Mar 10 2022 at 23:43):
@Chris Moesel , agreed, replaceMatches in both FHIRPath and CQL imply find semantics, and splitOnMatches would be silly without it.
Bryn Rhodes (Mar 10 2022 at 23:44):
So maybe define a .matchesFull() that has full-string matching semantics and clarify that .matches should use "find" semantics, consistent with PCRE?
Bryn Rhodes (Mar 10 2022 at 23:58):
And yes, confirmed that the replaceMatches in the CQL engine uses find semantics:
define TestReplaceMatches: ReplaceMatches('http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1', 'Library', 'Measure')
// returns http://fhir.org/guides/cqf/common/Measure/FHIR-ModelInfo|4.0.1
Ewout Kramer (Mar 14 2022 at 10:32):
'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('Library')
because it doesn't pass i java or dotnet: both of them treat matches as a full match, not a partial match
It doesn't? I am calling the underlying Regex.IsMatch .NET function which according to this: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.ismatch?view=net-6.0 does a partial match.
I just tested that:
Assert.IsTrue(Regex.IsMatch("http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1","Library"))
passes...
Bryn Rhodes (Mar 14 2022 at 15:27):
Okay, based on feedback here, proposed disposition for this technical correction: https://jira.hl7.org/browse/FHIR-36257
Bryn Rhodes (Mar 14 2022 at 15:30):
And another one to add .matchesFull() https://jira.hl7.org/browse/FHIR-36270
Bryn Rhodes (Mar 14 2022 at 15:32):
And a technical correction for the replaceMatches example: https://jira.hl7.org/browse/FHIR-36271
Gino Canessa (Mar 14 2022 at 15:47):
Thanks Bryn, I added some notes to FHIR-36257 that could provide additional clarity.
I am not sure I understand the implementation of matchesFull()
. Is it checking to see if an entire string matches a regular expression (e.g., all characters in input are part of a match group)? What is the use case that needs that vs. either matches
or =
?
Grahame Grieve (Mar 14 2022 at 22:10):
@Gino Canessa your weird and misplaced obsession with standard regex had please not be true.
Grahame Grieve (Mar 14 2022 at 22:10):
specifically the table you posted
Grahame Grieve (Mar 14 2022 at 22:11):
I don't want to have to scan and reverse engineeer the regex in order to make $test^ work
Gino Canessa (Mar 14 2022 at 22:13):
What did I miss? Following are from online tools (all the ones I tested agree):
image.png
Gino Canessa (Mar 14 2022 at 22:21):
(this is consistent with the regex implementation in C#, and I believe the Pattern
implementation in Java - though I have not tried it myself)
Grahame Grieve (Mar 14 2022 at 23:52):
the question is not, what do the underlying regex engines do, the question is, how does the function matches() work.
Gino Canessa (Mar 15 2022 at 00:23):
Sure @Grahame Grieve . But I believe the question came up, and several implementers have chimed since, that other languages have a different behavior for their language's matches
function. Java is the odd duck that has a separate matches
and find
- find
is aligned with what everyone else uses for matches
.
I do not think you would disagree if I say that programming languages have idiosyncrasies. In this case, there are production implementations with both behaviors. I am offering the view that FHIRPath should align with the general RegEx definition and behavior, instead of adopting the unique Java convention.
Development-wise, changing the Java implementation to use find
under-the-hood makes it the same as other languages. To my knowledge, neither JS (client or node) nor C# have any function that replicates the behavior of Java matches
, meaning other SDK developers would need to re-invent the behaviors to match.
Grahame Grieve (Mar 15 2022 at 01:33):
sigh. ok. I have switched it round, and added matchesFull to the java implementation
Grahame Grieve (Mar 17 2022 at 20:59):
@Ewout Kramer this breaks the widely used constraint sdf-0: name.matches('[A-Z]([A-Za-z0-9_]){0,254}'
>
I am internally treating this as name.matches('^[A-Z]([A-Za-z0-9_]){0,254}$')
in the java validator
Grahame Grieve (Mar 17 2022 at 23:27):
also eld-19 and eld-20 need wrapping with ^ and $
Ewout Kramer (Mar 21 2022 at 08:49):
Thanks, yes, I will admit that my mental regex parser does work like the Java matches(), so I never noticed these mistakes. We'll fix it in the .NET library too.
Last updated: Apr 12 2022 at 19:14 UTC