FHIR Chat · Matches question · fhirpath

Stream: fhirpath

Topic: Matches question


view this post on Zulip Grahame Grieve (Mar 10 2022 at 00:07):

According to the spec, matches():

Returns true when the value matches the given regular expression. Regular expressions should function consistently, regardless of any culture- and locale-specific settings in the environment, should be case-sensitive, use 'single line' mode and allow Unicode characters.

view this post on Zulip Grahame Grieve (Mar 10 2022 at 00:08):

so, should this test pass:

view this post on Zulip Grahame Grieve (Mar 10 2022 at 00:08):

'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('Library')

view this post on Zulip Grahame Grieve (Mar 10 2022 at 00:12):

my take is that this should not pass. @Bryn Rhodes @Ewout Kramer

because it doesn't pass i java or dotnet: both of them treat matches as a full match, not a partial match

view this post on Zulip Paul Lynch (Mar 10 2022 at 00:31):

If you want it to match the full string, you could use the beginning/end of string markers:
'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('^Library$')
which would be false.
Without those, I think it should be true. Otherwise, how would you test whether an expression matches part of a string?

view this post on Zulip Chris Moesel (Mar 10 2022 at 02:46):

JavaScript String.match(regex) does not require a full-string match. It works like @Paul Lynch suggests (matches any region; use ^/ $ to force full-string matches).

view this post on Zulip Chris Moesel (Mar 10 2022 at 02:51):

That said, I think I recall discussing CQL matches behavior w/ @Bryn Rhodes and he indicated that it should be full-string matching for CQL. Assuming I remember that correctly, I expect that's also the intent for FHIRPath.

view this post on Zulip Brian Postlethwaite (Mar 10 2022 at 08:29):

I'm with Paul on this one. We've been explicit with the other parameters for the regex engine.
(not sure why we'd have said single line too, means can't process narratives, or longer text content)

view this post on Zulip Grahame Grieve (Mar 10 2022 at 19:08):

Otherwise, how would you test whether an expression matches part of a string?

'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('.*Library.*')

view this post on Zulip Grahame Grieve (Mar 10 2022 at 19:10):

for me, a full string match is the intent. The spec says "when the value matches the given regular expression" not "when the value has content that matches the given regular expression"

This is a subtle difference. But this certainly needs clarity

view this post on Zulip Paul Lynch (Mar 10 2022 at 19:51):

When languages (e.g. JavaScript, Ruby, Perl) provide the ability to match a string against a regular expression, it is the regular expression that controls whether it is against the full string or not. I think it would be very unexpected to provide a "match" API that takes a regular expression but always requires it to match against the full string.

view this post on Zulip Grahame Grieve (Mar 10 2022 at 20:00):

given that's what Java does, it's not going to be that unexpected

view this post on Zulip Lloyd McKenzie (Mar 10 2022 at 20:03):

It's also what XML schema does

view this post on Zulip Paul Lynch (Mar 10 2022 at 20:19):

Grahame Grieve said:

given that's what Java does, it's not going to be that unexpected

I guess it has been too many years since I did Java development. I had to go test that, but you are right.

view this post on Zulip Paul Lynch (Mar 10 2022 at 20:26):

Java has both matches() and find() (in Matcher) and find() is the one that will look for a match in a substring of the string. It sounds like if FHIRPath matches() is supposed to be like Java matches(), then find() should be added as well.

view this post on Zulip Paul Lynch (Mar 10 2022 at 20:27):

I don't understand why Java has both APIs though, when the regular expression can control that.

view this post on Zulip Gino Canessa (Mar 10 2022 at 20:29):

A tool I frequently use is https://regexr.com/ (though I tested against several just to be sure - feel free to use whichever tester you would like).

If you enter the text: http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1, you can see the difference in the evaluations:

RegEx IsMatch Notes
Library true matches the literal 'Library' in the string
.*Library.* true matches the entire string
^Library$ false asking for exact literal

Given that every 'tester' I used is consistent, and assuming the intention is that FHIRPath is consistent with regex, I would apply those behaviors.

edit: actually, at https://regex101.com/ it can perform the evaluation in a lot of contexts (PHP versions, JS, Java, .Net). They are all consistent with the above.

view this post on Zulip Grahame Grieve (Mar 10 2022 at 20:46):

well, I use that too. But it's testing regex matching, not the behaviour of the match() function, which is where the actual question arises.

view this post on Zulip Grahame Grieve (Mar 10 2022 at 20:46):

either way around works - you just have to prefix/suffix for the other case (either ^ & $ or .*)

view this post on Zulip Gino Canessa (Mar 10 2022 at 20:49):

Sure. My point is that matching against Library, .*Library.*, and ^Library$ are different things in RegEx. FHIRPath can define any behavior it wants, but if it differs from the behavior of regex there, it will cause confusion down the road.

edit for clarity

view this post on Zulip Grahame Grieve (Mar 10 2022 at 20:49):

don't understand why Java has both APIs though, when the regular expression can control that.

a mature API. If you have contexts in which the regular expression is written for one approach and not the other, then massaging it from one to the other is yucky

view this post on Zulip Grahame Grieve (Mar 10 2022 at 20:50):

if FHIRPath matches() is supposed to be like Java matches(), then find() should be added as well

I think that's the right thing for us to do here

view this post on Zulip Grahame Grieve (Mar 10 2022 at 21:02):

it differs from the behavior of regex there

my point is that this is orthogonal. The question isn't regex, the question is the meaning of the match() function. And we're screwed one way or the other because java does it one way and javascript does it another

view this post on Zulip Gino Canessa (Mar 10 2022 at 21:15):

Yep, I get that we are discussing the behavior of a FHIRPath function. It could be defined to do literally anything and would be correct by definition =).

But this is also relevant for the sibling replaceMatches function, which I assume should use the same semantics as matches and depends on what parts of the string are actually matched.

view this post on Zulip Chris Moesel (Mar 10 2022 at 22:28):

Ha. That's a good point. replaceMatches wouldn't work very well if it only did full-string matching, now would it?

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 22:29):

I agree this test should not pass, but also agree that the specification needs some clarity here. I have confirmed that the CQL engine behaves this way (fails the test), and that both the FHIRPath and CQL descriptions of the matches (and replaceMatches) functions are identical. Given that the engines are confirmed to behave this way I think this can be a technical correction to clarify the expected semantics here on FHIRPath and CQL.

view this post on Zulip Gino Canessa (Mar 10 2022 at 22:36):

Bryn, how can you replace just 'Library' in http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1 via replaceMatches if that test fails? (assuming normal regex processing that replace .*Library.* would replace the whole string)

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 22:54):

Hmm.... fair point, so basically the way it works now we can't hit that simple use case :(

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 22:55):

So as Grahame is suggesting, we'd have to introduce a findMatches() and change replaceMatches to use find semantics.

view this post on Zulip Gino Canessa (Mar 10 2022 at 23:01):

I am not familiar enough with FHIRpath implementations to weigh in on that part. I do not see any examples of match in the spec, but I do see something under replaceMatches... which uses the replace function =).

Assuming that we swap out replace with repalceMatches, the example would indicate that the match semantics would need the be the 'typical regex' version to work.

I also discovered the note in that section that the implementation should generally align with PCRE, in which the original test matches successfully.

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:05):

Oh what a tangled web we weave.

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:06):

Are you saying that PCRE semantics implies the test should pass?

view this post on Zulip Gino Canessa (Mar 10 2022 at 23:10):

As far as I can tell, yes. https://regex101.com/ has the option for testing against PCRE and it passes there.

view this post on Zulip Chris Moesel (Mar 10 2022 at 23:18):

Note that FHIRPath's replaceMatches says:

Matches the input using the regular expression in regex and replaces each match with the substitution string.

I think the phrase "replaces each match" implies that the pattern can match on sequences within the input string (not just the whole string itself).

view this post on Zulip Chris Moesel (Mar 10 2022 at 23:21):

@Bryn Rhodes -- As you know, CQL has matches, replaceMatches, and splitOnMatches.

Whether matches is full-string or not is ambiguous. But replaceMatches contains a description similar to what I pointed out above (implying matches against substrings) and an example that demonstrates this:

define "ReplaceMatchesFound": ReplaceMatches('ABCDE', 'C', 'XYZ') // 'ABXYZDE'

And of course SplitOnMatches would just be silly if the pattern was required to match the whole string.

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:40):

but I do see something under replaceMatches... which uses the replace function =).

Ha, that one is definitely a bug.

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:43):

@Chris Moesel , agreed, replaceMatches in both FHIRPath and CQL imply find semantics, and splitOnMatches would be silly without it.

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:44):

So maybe define a .matchesFull() that has full-string matching semantics and clarify that .matches should use "find" semantics, consistent with PCRE?

view this post on Zulip Bryn Rhodes (Mar 10 2022 at 23:58):

And yes, confirmed that the replaceMatches in the CQL engine uses find semantics:

define TestReplaceMatches: ReplaceMatches('http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1', 'Library', 'Measure')
// returns http://fhir.org/guides/cqf/common/Measure/FHIR-ModelInfo|4.0.1

view this post on Zulip Ewout Kramer (Mar 14 2022 at 10:32):

'http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1'.matches('Library')
because it doesn't pass i java or dotnet: both of them treat matches as a full match, not a partial match

It doesn't? I am calling the underlying Regex.IsMatch .NET function which according to this: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.ismatch?view=net-6.0 does a partial match.

I just tested that:

Assert.IsTrue(Regex.IsMatch("http://fhir.org/guides/cqf/common/Library/FHIR-ModelInfo|4.0.1","Library")) passes...

view this post on Zulip Bryn Rhodes (Mar 14 2022 at 15:27):

Okay, based on feedback here, proposed disposition for this technical correction: https://jira.hl7.org/browse/FHIR-36257

view this post on Zulip Bryn Rhodes (Mar 14 2022 at 15:30):

And another one to add .matchesFull() https://jira.hl7.org/browse/FHIR-36270

view this post on Zulip Bryn Rhodes (Mar 14 2022 at 15:32):

And a technical correction for the replaceMatches example: https://jira.hl7.org/browse/FHIR-36271

view this post on Zulip Gino Canessa (Mar 14 2022 at 15:47):

Thanks Bryn, I added some notes to FHIR-36257 that could provide additional clarity.

I am not sure I understand the implementation of matchesFull(). Is it checking to see if an entire string matches a regular expression (e.g., all characters in input are part of a match group)? What is the use case that needs that vs. either matches or =?

view this post on Zulip Grahame Grieve (Mar 14 2022 at 22:10):

@Gino Canessa your weird and misplaced obsession with standard regex had please not be true.

view this post on Zulip Grahame Grieve (Mar 14 2022 at 22:10):

specifically the table you posted

view this post on Zulip Grahame Grieve (Mar 14 2022 at 22:11):

I don't want to have to scan and reverse engineeer the regex in order to make $test^ work

view this post on Zulip Gino Canessa (Mar 14 2022 at 22:13):

What did I miss? Following are from online tools (all the ones I tested agree):
image.png

view this post on Zulip Gino Canessa (Mar 14 2022 at 22:21):

(this is consistent with the regex implementation in C#, and I believe the Pattern implementation in Java - though I have not tried it myself)

view this post on Zulip Grahame Grieve (Mar 14 2022 at 23:52):

the question is not, what do the underlying regex engines do, the question is, how does the function matches() work.

view this post on Zulip Gino Canessa (Mar 15 2022 at 00:23):

Sure @Grahame Grieve . But I believe the question came up, and several implementers have chimed since, that other languages have a different behavior for their language's matches function. Java is the odd duck that has a separate matches and find - find is aligned with what everyone else uses for matches.

I do not think you would disagree if I say that programming languages have idiosyncrasies. In this case, there are production implementations with both behaviors. I am offering the view that FHIRPath should align with the general RegEx definition and behavior, instead of adopting the unique Java convention.

Development-wise, changing the Java implementation to use find under-the-hood makes it the same as other languages. To my knowledge, neither JS (client or node) nor C# have any function that replicates the behavior of Java matches, meaning other SDK developers would need to re-invent the behaviors to match.

view this post on Zulip Grahame Grieve (Mar 15 2022 at 01:33):

sigh. ok. I have switched it round, and added matchesFull to the java implementation

view this post on Zulip Grahame Grieve (Mar 17 2022 at 20:59):

@Ewout Kramer this breaks the widely used constraint sdf-0: name.matches('[A-Z]([A-Za-z0-9_]){0,254}'>

I am internally treating this as name.matches('^[A-Z]([A-Za-z0-9_]){0,254}$') in the java validator

view this post on Zulip Grahame Grieve (Mar 17 2022 at 23:27):

also eld-19 and eld-20 need wrapping with ^ and $

view this post on Zulip Ewout Kramer (Mar 21 2022 at 08:49):

Thanks, yes, I will admit that my mental regex parser does work like the Java matches(), so I never noticed these mistakes. We'll fix it in the .NET library too.


Last updated: Apr 12 2022 at 19:14 UTC