FHIR Chat · algorithms · cql

Stream: cql

Topic: algorithms


view this post on Zulip Frank Adrian (Oct 02 2019 at 23:11):

The problem I am working on at the moment seems to condense to a couple of very simple algorithms whose solutions in CQL, nevertheless, have remained elusive. The first and simplest of the two concerns combining two lists:

Define the function fold({a0, a1, ..., ak}, {b0, b1, ..., bk}) -> {{a1, b1}, {a2, b2}, ..., {ak, bk}}

The second concerns finding clusters of dates. Define an n-cluster of dates as being groups of dates that are separated by no more than n days. For example the dates 2019-01-05, 2019-01-15, and 2019-01-24 are in a 10-cluster, but not in a 9-cluster. From the definition, it's clear that n-clusters are separated by gaps of more than n days. There can also be singleton clusters, that contain only one date which is separated from both it's predecessor and antecedent by gaps of more than n-days. My problem condenses to this:

Given D, a List<DateTime> {d0, d1, d2, ..., dk}, d0 <= d1 <= d2 <= dk and an integer n, return a list of all n-clusters in D. For example for the ordered list, {d0, d1, d2, [31-day gap] d3, [70-day gap], d4, d5, [50-day gap] d6, d7, d8, d9} and n = 30, return the list {{d0, d1, d2}, {d3}, {d4, d5}, {d6, d7, d8, d9}}. Alternatively, returning the endpoints of each cluster is also acceptable: {{d0, d2}, {d3, d3}, {d4, d5}, {d6, d9}}.

Any ideas as to how these two functions would be defined in CQL?

view this post on Zulip Chris Moesel (Oct 03 2019 at 00:29):

Oooh. This feels like a coding interview question! I only have time for one, but here's a solution for folding in CQL. I'll admit, it's a little weird, but I tested it and it works.

library CQLFold version '1.0'

define function Fold(L1 List<String>, L2 List<String>):
 (expand { Interval[0, Length(L1)-1] } per 1 '1') I
   return { L1[start of I], L2[start of I] }

// Test it
define L1: {'A', 'B', 'C'}
define L2: {'X', 'Y', 'Z'}
define Answer: Fold(L1, L2)

So we basically want to iterate over a list of indexes 0 to N and then create pairs of { L1[i], L2[i] }. Here's a breakdown of the CQL since it is a little funky. For this example, consider L1: {'A', 'B', 'C'} and L2: {'X', 'Y', 'Z'}.

  • First create an interval from 0 to Length(L1) - 1
    • CQL: { Interval[0, Length(L1)-1] }
    • Result: => Interval[0,2]
  • Then expand it into intervals of size 1, giving us Length(L1)-1 unit intervals
    • CQL: (expand { Interval[0, Length(L1)-1] } per 1 '1')
    • Result: { Interval[0,0], Interval[1,1], Interval[2,2] }
  • Now query over that returning { L1[start of I], L2[start of I] } pairs
    • CQL: I return { L1[start of I], L2[start of I] }
    • Result: { {'A','X'}, {'B','Y'}, {'C','Z'} }

view this post on Zulip Frank Adrian (Oct 03 2019 at 01:18):

Thanks, Chris. Because of this, I was able to solve my n-cluster problem. Here's the solution:

define function leftEndPoints(l List<DateTime>, n Integer): l L without l L2 such that L2 < L and difference in days between L2 and L < 30
define function rightEndPoints(l List<DateTime>, n Integer): l L without l L2 such that L2 > L and difference in days between L and L2 < 30

define LHTest: leftEndPoints({DateTime(2019, 1, 1), DateTime(2019, 1, 2), DateTime(2019, 1, 3), DateTime(2019, 3, 1), DateTime(2019, 5, 1), DateTime(2019, 5, 2), DateTime(2019, 7, 1), DateTime(2019, 9, 1), DateTime(2019, 9, 2), DateTime(2019, 9, 3), DateTime(2019, 9, 4)},
30)
define RHTest: rightEndPoints({DateTime(2019, 1, 1), DateTime(2019, 1, 2), DateTime(2019, 1, 3), DateTime(2019, 3, 1), DateTime(2019, 5, 1), DateTime(2019, 5, 2), DateTime(2019, 7, 1), DateTime(2019, 9, 1), DateTime(2019, 9, 2), DateTime(2019, 9, 3), DateTime(2019, 9, 4)},
30)

define function nClusters(l List<DateTime>, n Integer): fold(leftEndPoints(l, n), rightEndPoints(l, n))

define nClusterTest: nClusters({DateTime(2019, 1, 1), DateTime(2019, 1, 2), DateTime(2019, 1, 3), DateTime(2019, 3, 1), DateTime(2019, 5, 1), DateTime(2019, 5, 2), DateTime(2019, 7, 1), DateTime(2019, 9, 1), DateTime(2019, 9, 2), DateTime(2019, 9, 3), DateTime(2019, 9, 4)},
30)

view this post on Zulip Frank Adrian (Oct 03 2019 at 01:23):

OK, replace the 30's in the leftEndPoints and rightEndPoints functions with n and it will be correct for n other than 30 :slight_smile: .

view this post on Zulip Chris Moesel (Oct 03 2019 at 01:25):

That's awesome. I couldn't help myself so I was also working on this -- but from a different angle. I felt like it should be a fairly simple application of the collapse operator. I came up with this (returning the endpoints only):

library CQLClusters

define function Clusters(Dates List<Date>, N Quantity):
  collapse (Dates D return Interval[D,D]) per N

define Dates: {
  @2000-02-10,
  @2000-02-28,
  @2000-03-01,
  // 31-day gap
  @2000-04-01,
  // 70-day gap
  @2000-06-10,
  @2000-06-15,
  // 50-day gap
  @2000-08-04,
  @2000-09-01,
  @2000-10-01,
  @2000-10-31
 }

define Answer: Clusters(Dates, 30 days)

But... it doesn't work in the Java-based CQL execution engine or the JavaScript-based CQL execution engine, so either they're wrong or I'm wrong. I code-reviewed the JavaScript engine's collapse implementation before it was merged, so I like to think that it's right, but then that means my CQL is wrong. ;-) As for the Java-based implementation, it seems definitely way wrong because it returns [Interval[2000-02-10, 2000-11-07]] which goes entirely past the input intervals (@Bryn Rhodes ).

Anyway, I'm glad you found a way, and I'm glad my first solution could help lead you there!

view this post on Zulip Chris Moesel (Oct 03 2019 at 01:32):

Although, I have to admit, I popped your solution into CQL Runner and I got the wrong answers:

>> Fold [2:1] Definition successfully validated
>> leftEndPoints [6:1] Definition successfully validated
>> rightEndPoints [7:1] Definition successfully validated
>> LHTest [9:1] [2019-01-01, 2019-01-02, 2019-01-03, 2019-03-01, 2019-05-01, 2019-05-02, 2019-07-01, 2019-09-01, 2019-09-02, 2019-09-03, 2019-09-04]
>> RHTest [11:1] [2019-01-01, 2019-01-02, 2019-01-03, 2019-03-01, 2019-05-01, 2019-05-02, 2019-07-01, 2019-09-01, 2019-09-02, 2019-09-03, 2019-09-04]
>> nClusters [14:1] Definition successfully validated
>> nClusterTest [16:1] [[2019-01-01, 2019-01-01], [2019-01-02, 2019-01-02], [2019-01-03, 2019-01-03], [2019-03-01, 2019-03-01], [2019-05-01, 2019-05-01], [2019-05-02, 2019-05-02], [2019-07-01, 2019-07-01], [2019-09-01, 2019-09-01], [2019-09-02, 2019-09-02], [2019-09-03, 2019-09-03], [2019-09-04, 2019-09-04]]

If it worked for you, however, perhaps it is an issue with CQL Runner.

view this post on Zulip Frank Adrian (Oct 03 2019 at 03:10):

I can't comment on CQL Runner, but my implementation gives
LHTest: [2019-01-01, 2019-03-01, 2019-05-01, 2019-07-01, 2019-09-01]
RHTest: [2019-01-03, 2019-03-01, 2019-05-02, 2019-07-01, 2019-09-04]

nClusterTest:[[2019-01-01, 2019-01-03],[2019-03-01, 2019-03-01], [2019-05-01, 2019-05-02], [2019-07-01, 2019-07-01], [2019-09-01, 2019-09-04]]

view this post on Zulip Bryn Rhodes (Oct 03 2019 at 04:15):

Clever use of expand to create a sequence @Chris Moesel, very nice :)

view this post on Zulip Bryn Rhodes (Oct 03 2019 at 04:15):

I agree these should be working, I will test these on the runner and get to the bottom of why they're not giving the results expected here.

view this post on Zulip Chris Moesel (Oct 03 2019 at 12:50):

@Bryn Rhodes, it did get me to thinking though... should we consider some new features for CQL?

  1. Convert an Interval<Integer> to a list. Ruby supports something like this. It might look like convert Interval[0,15] to List<Integer>. That said, this is a pretty narrow use case and I wouldn't want to have to support it for other types, so maybe not. The expression for doing this ((expand { Interval[0, N] } per 1 '1') IVL return start of IVL) isn't too bad.

  2. Support an indexer argument on queries. Since the primary use case for getting a sequence of integers might be something similar to the fold operation, we could borrow a page from ES6 (sort of) and have an optional indexer argument on a query. Then that fold function would be simplified to:

define function Fold(L1 List<String>, L2 List<String>):
 L1 Item, Index return { Item, L2[Index] }

That said, we already use , to separate sources in a multi-source query, so we'd want to be careful not to confuse matters too much (although, technically, I think the parser could disambiguate). Still if we wanted to make it clearer, we could introduce new keywords (e.g., L1 Item at Index, L1 Item at index Index, L1 Item using index Index, etc).

Anyway, just a thought. Do you think it's worth me filing an official STU comment?

view this post on Zulip Frank Adrian (Oct 03 2019 at 13:05):

It might be. Another choice in addition to an indexer, could be lag(n) and/or lead(n) functions like are found in some SQL implementations, which return records n elements before or after the current one. Sometimes relative indexing works better than absolute.

view this post on Zulip Chris Moesel (Oct 03 2019 at 14:58):

re: lag/lead, couldn't you accomplish the same w/ an index? E.g. Items.lag(4) == Items[Index-4]?

view this post on Zulip Chris Moesel (Oct 03 2019 at 14:58):

Although I guess in a complex query, you don't have a simple variable that equates to all the results... so maybe not.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 18:39):

@Chris Moesel , apologies for the delay here. Yes, I think it would be worth submitting those as STU comments. Another specific item I've wished for is some answer for recursive queries.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 18:40):

For the indexer argument on queries, am I misunderstanding the request, or would the $this iteration accessor work?

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 18:40):

Sorry, $index, not $this

view this post on Zulip Chris Moesel (Oct 07 2019 at 19:33):

Hey @Bryn Rhodes -- is $index defined as part of the core CQL spec? I'm having trouble finding any documentation about it in the CQL spec. Even in the FHIRPath 1.0 spec, only $this is called out as a keyword.

view this post on Zulip Chris Moesel (Oct 07 2019 at 19:34):

OK, I did just find it in the proposed FHIRPath 2.0 spec, but still having trouble finding it in the CQL spec.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 19:42):

True enough, I had thought we documented those here: https://cql.hl7.org/03-developersguide.html#using-fhirpath

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 19:42):

Sounds like another STU comment.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 19:44):

Since the cql.g4 includes the fhirpath.g4, it is part of the grammar, but yes, the specification should be clear about the behavior in CQL.

view this post on Zulip Chris Moesel (Oct 07 2019 at 20:31):

I don't think we need $index and the second index arg on queries -- so if have $index that ought to be good enough.

view this post on Zulip Chris Moesel (Oct 07 2019 at 20:33):

But is $this and $index considered part of the core spec? E.g., can we expect standard CQL engines (for CDS/eCQM use cases) to support this? If it's in the "Using FHIRPath" section, it sounds ancillary.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 20:54):

Well CQL is defined as a superset of FHIRPath, so any valid FHIRPath expression should be valid in CQL.

view this post on Zulip Bryn Rhodes (Oct 07 2019 at 20:55):

There are some specific options in the translator for disabling some of the FHIRPath functionality, but in general, yes, any CQL engine should be able to run any FHIRPath expression.

view this post on Zulip Chris Moesel (Oct 07 2019 at 23:29):

I understand that -- but I thought there were general recommendations regarding what features should generally be used or not used in CQL for quality purposes (e.g., for CDS/eCQM). For example, as you know (since you probably wrote it), the CPG IG defines recommended compiler settings here: http://hl7.org/fhir/uv/cpg/2019SEP/documentation-libraries.html.

I guess I just have a feeling (right or not) that quality engines may ignore most of what is in the FHIRPath part of CQL (especially since it is mainly relegated to the Appendix -- and Appendix Letter I no less!). If we want to encourage use of $index in CDS/eCQM and quality engines, we probably ought to document it in the actual chapters of the spec rather than just the appendix. As someone who built a reference implementation myself, I have to admit that I did not add support for $this or $index (yet)!

view this post on Zulip Saoji Adhe (Oct 24 2019 at 11:29):

Hi @Chris Moesel ,
i want to test cqf-ruler-0.1.13-SNAPSHOT.jar with hapi-fhir-3.8 , to test $evaluate-measure
but it's not working with example given by hspc in there documentation
anybody know about where i will get example related to $evaluate-measure, $apply, cds-hooks etc...
Thank you in advance.

view this post on Zulip Chris Moesel (Oct 24 2019 at 14:29):

Hi @Saoji Adhe -- I don't work on (or use) CQF Ruler, so I'm afraid I can't help you much. You might want to take up @Alexander Kiel on his offer to try Blaze, or wait for a response from @Bryn Rhodes who is more familiar with CQF Ruler.

view this post on Zulip Bryn Rhodes (Oct 24 2019 at 18:52):

@Saoji Adhe , there is a CQF-Ruler running here that we use as a sandbox for the connectathons. That has the content from the latest connectathon loaded. We are still addressing issues uncovered by that connectathon, so no guarantees that everything there is 100% functional, but it should be enough to play around with.

view this post on Zulip Frank Adrian (Dec 31 2019 at 18:02):

Another CQL query question.

I have a list of MedicationDispenses (MD[i]), each with a whenHandedOver date and daysSupply quantity. For each MedicationDispense, I need to compute the consumptionInterval(CI) for the dispense, defined as the earliest interval when this drug could be consumed (i.e., starting at the later of when all prior dispenses have been consumed or when this dispense was handed over and ending when this dispense runs out):
start of CI[i] = maximum(CI[i-1] + 1 day, MD[i].whenHandedOver),
end of CI[i] = start of CI[i] + daysSupply

The problem is that CI[i] depends on the value of CI[i-1], which you can't access from the computation of CI[i]. How does one go about doing something like this in CQL? The best solution I've found so far involves unrolling the evaluation loop, making each calculation of a given CI[i] its own definition, but not only is thisdefinitely not the CQL way to solve this, it also doesn't work for more dispenses than the loop has been unrolled with.

If you want to get CS'y about this, it boils down to computing the earliest start and end times for tasks using batch queueing on a uniprocessor (the when handed over date is the arrival time, the days supply is the task length, and the consumption interval is the run interval of the task, and the patient is the uniprocessor). Calculating this is essentially a (very simple) dynamic programming problem. The trouble is that CQL doesn't seem very amenable to dynamic programming problems. If I had a reduce function, this would be fairly simple, but CQL doesn't have that, either. Any ideas?

view this post on Zulip Frank Adrian (Dec 31 2019 at 18:44):

I also forgot to say that by collapsing the CI[i], you compute the overall set of intervals during which the patient was supplied with drugs.

view this post on Zulip Lloyd McKenzie (Jan 03 2020 at 14:59):

@Bryn Rhodes

view this post on Zulip Bryn Rhodes (Jan 03 2020 at 18:20):

I haven't had time to give this the attention it deserves, but I wonder if the .aggregate() operator would provide a solution here?

view this post on Zulip Frank Adrian (Jan 06 2020 at 18:51):

Thanks for the suggestion, Bryn. I'll see if this function is implemented and if I can come up with something that works using the function.

view this post on Zulip Keith Deutsch (Jan 06 2020 at 22:30):

@Bryn Rhodes , thanks for the aggregate suggestion, but I'm not sure it applies. What I'd like to try to do is reduce @Frank Adrian 's description of the overall problem to a simple, explicit example of what I think is the core problem here. Basically, we can reduce the various MedicationDispense events to a list of intervals like this:

{[1,6], [2,3], [7,10], [15,17]}

And what I think we need to do is to determine a kind of extending overlap such that if the first two overlap, they are added together recursively until we hit a non- overlapping member. The desired sequence of reductions is then:

{[1,8], [7,10], [15,17]}

and finally: {[1,12], [15,17]}

I can achieve this relatively easily in a general purpose language by iteratively replacing the first and second intervals in the list with a single interval that starts when the first interval starts, and has a length that is the sum of the lengths of the first and second intervals, iterating as long as the first two intervals overlap, and then starting the process again with the now second element, until I reach the end of the list. I haven't been able to figure out a way to do this in CQL.

view this post on Zulip Frank Adrian (Jan 09 2020 at 16:30):

I found an algorithm that (sort of) works, but I basically had to manually unroll the iteration as follows:

//Dispense objects represent what happens to a given dispense - it is received on the who (whenHandedOver) date, has ds days supply, with the start of consumption being st, and the end of consumption being nd.
define function Dispense(who DateTime, ds Integer): Tuple{who:date from who, ds: ds, st: date from who, nd: date from who + ds}
define function Dispense(who DateTime, ds Integer, st DateTime, nd DateTime): Tuple{who:date from who, len: ds, st: date from st, nd: date from nd}

//For testing...
define "All Dispenses": ({Dispense(DateTime(2018, 1, 1), 30), Dispense(DateTime(2018, 1, 1), 30), Dispense(DateTime(2018, 2, 1), 30), Dispense(DateTime(2018, 3, 1), 30)}) T sort by date from who

define function Index(l List<Any>): (expand {Interval[0, Count(l) - 1]} per 1 '1') I return Tuple{idx: start of I, obj: l[start of I]}

define "All Dispenses Indexed": Index("All Dispenses")
define function GetItem(dispenseListSoFar List<Tuple{idx Integer, obj Any}>, i Integer): First(dispenseListSoFar T where T.idx = i return all T.obj)

//Work around random issue in the execution engine
define function MyMaximum(dtl DateTime, dtr DateTime): if dtl <= dtr then dtr else dtl

//Calculate the start and end consumption dates for the i'th dispense of a given drug. The start of consumption is the maximum of the whenHandedOver date of the i'th dispense
//and the end of consumption of the i-1'th dispense plus one day. The end of consumption for the i'th dispense is the start of consumption of the i'th dispense plus daysSupply.
//Note that the start and end consumption dates for the i'th dispense depends on the end consumption date for the i-1'th dispense (and thus depends on all previous consumption
//dates, which must be computed before the i'th is computed).
define function ComputeStartAndEndDate(i Integer, dispListSoFar List<Tuple{idx Integer, obj Tuple{who DateTime, ds Integer, st DateTime, nd DateTime}}>):
    First(({0}) X //This bit of cruft needed because CQL does not let you define a let outside the confines of a query.
        let dsp: GetItem(dispListSoFar, i) as Tuple{who DateTime, ds Integer, st DateTime, nd DateTime},
            prv: GetItem(dispListSoFar, i-1) as Tuple{who DateTime, ds Integer, st DateTime, nd DateTime},
            //ccStart: Maximum(prv.nd + 1 day, dsp.who),
            ccStart: MyMaximum(date from prv.nd + 1 day, date from dsp.who),  //work around compiler issue above
            ccEnd: ccStart + dsp.ds - 1 day,
            lim: Count("All Dispenses Indexed") - 1
        return if i <= 0 or i >= lim
               then dispListSoFar
               else
                  (dispListSoFar except {dsp}) union {Tuple{idx: i, obj: Dispense(dsp.who, dsp.ds, ccStart, ccEnd)}})


define Test0: "All Dispenses Indexed" as List<Tuple{idx Integer, obj Tuple{who DateTime, ds Integer, st DateTime, nd DateTime}}>
define Test1: ComputeStartAndEndDate(1, Test0 as List<Tuple{idx Integer, obj Tuple{who DateTime, ds Integer, st DateTime, nd DateTime}}>)
define Test2: ComputeStartAndEndDate(2, Test1 as List<Tuple{idx Integer, obj Tuple{who DateTime, ds Integer, st DateTime, nd DateTime}}>)
define Test3: ComputeStartAndEndDate(3, Test2 as List<Tuple{idx Integer, obj Tuple{who DateTime, ds Integer, st DateTime, nd DateTime}}>)
//...up to TestN where N is Count("All Dispenses Indexed") - 1.
//Ideally, there would be a construct in the language that would allow us to iterate (or at least to do the unrolling in a single line of code).
//Here, we're unrolling the loop manually. We'll set a limit here of 50 dispenses or so, with an error escape in case the number of dispenses exceed that amount.

As far as I can tell, CQL's inability to modify data that has been defined previously requires a new definition to be created for each new piece of state that needs to be iterated over. I'd still like a better solution, if anyone can come up with one.

view this post on Zulip Bryn Rhodes (Feb 04 2020 at 12:39):

So, I'm not 100% sure this will work, but here's what I think is a solution using the aggregate() method:

define MedicationRequestIntervals:
  [MedicationRequest] MD
    return Interval[MD.whenHandedOver, MD.whenHandedOver + (MD.daysSupply * 1 day)]

define RolledOutIntervals:
  MedicationRequestIntervals.aggregate($total union
    ($this) X
      let S: Max({ end of Last($total) + 1 day, start of X }),
      E: S + duration in days of X
      return Interval[S, E]
  )

view this post on Zulip Bryn Rhodes (Feb 04 2020 at 12:41):

Basically, aggregate sets the value of $total to the result of evaluating the function after each iteration, and returns the overall $total. So each iteration will union the current $total with the calculation of an Interval based on the Last interval in $total.

view this post on Zulip Bryn Rhodes (Feb 04 2020 at 12:41):

I think there's probably a bug in that Last will return null if $total is empty (on the first iteration) so there's probably a Coalesce needed there, but that's the gist.

view this post on Zulip Bryn Rhodes (Feb 04 2020 at 12:42):

Again, I'm not sure it would work, and even if it did, I'd like to see something more first-class that could handle this.

view this post on Zulip Bryn Rhodes (Feb 11 2020 at 15:01):

At the Sydney WGM we worked on this problem and are proposing an aggregate clause that would support expression of this type of query. In particular: pasted image


Last updated: Apr 12 2022 at 19:14 UTC