FHIR Chat · Data quality · smart/scheduling-links

Stream: smart/scheduling-links

Topic: Data quality

Josh Mandel (Apr 18 2021 at 01:56):

Just to highlight the challenge of limiting to coarse-grained data: as long as there's better, more accurate data available from scraping than from a published API, folks will scrape. https://www.vaccinespotter.org/WI has excellent content, but it's fragile!

Rob Brackett (Apr 20 2021 at 21:17):

As a general strawman for aggregation, I put together http://getmyvax.org/smart-scheduling/$bulk-publish from data USDR is starting to gather across states (it lists all states, but only a few have data — I suggest looking at NJ).

It exhibits a lot of the issues around coarse data that I think are reasonably well known, but maybe worth listing here:

We pull in data from a variety of sources that might not have an authoritative URI for identifiers, so if we don’t have one, we list the identifier’s system as https://fhir.usdigitalresponse.org/identifiers/<SIMPLE_NAME>:

"identifier": [
  {
    // CVS Store number
    "system": "https://fhir.usdigitalresponse.org/identifiers/cvs",
    "value": "01929"
  },
  {
    // New Jersey Immunization Information System
    "system": "https://fhir.usdigitalresponse.org/identifiers/nj_iis",
    "value": "NJ96820"
  }
]

(Similar, but slightly different from Nick Robison's approach with http://usds.gov/vaccine/source-identifier)

In many cases, we have "unknown" availability for a location. We handle that by having a slot with 0 capacity, but it’s a bit hacky. On the other hand, an original slot publisher should probably never need this kind of feature, so it might be aggregator-specific, and maybe not worth supporting directly in the spec.
In many cases, we only know a location has (no) available appointments at some time in the future, so we can’t define a good slot time or capacity. For slot time, we currently just set start and end to cover an 8-hour period in the future, but it might be nice if those could just be omitted to indicate there’s capacity, but at no particular date.
In many cases, we only know a location has > 0 available slots, so we list a capacity of 1. CVS is doing something similar, but it might be nice if we had a way (maybe a different extension than capacity).

^ A lot of these issues are not cases we should expect a first-party slot publisher to encounter, or features we would not want to encourage them to use, so I also understand if we don’t think the spec should account for them. But I do think aggregators that use any non-SMART-Scheduling-Links sources will have trouble totally complying with the standard without something to address these.

Josh Mandel (Apr 20 2021 at 23:45):

Thanks for this report @Rob Brackett! A few quick thoughts and questions:

We pull in data from a variety of sources that might not have an authoritative URI for identifiers, so if we don’t have one, we list the identifier’s system

Great! In the case where you're aggregating data that originate through SMART Scheduling Links, you might also consider populating Meta.source with a URL pointing back to the base URL of the system where these data originated (e.g., "https://github.com/jmandel/wba-appointment-fetch/blob/gh-pages" for my sample data).

In many cases, we have "unknown" availability for a location.

I want to understand the intended semantics here. Is this for things like "source API is down right now, so we don't have any current data"? If so, it might be worth annotating the schedule rather than creating a placeholder slot.

In many cases, we only know a location has (no) available appointments at some time in the future

If there's no timing information known, we might again think about annotating the Schedule with this kind of information (e.g., a some-availability extension or a no-availability extension), since there's not much Slot-like to say.

In many cases, we only know a location has > 0 available slots,

From an aggregator perspective, a workaround might be to populate a custom extension like capacity-unknown to indicate >0; would want to be clear that original publishers shouldn't annotate slots this way, though. Alternatively, we could switch our estimates from valueInteger to valueRange (so there could be a lower and upper bound), or capacity-min and capacity-max extensions. I'm worried about providing too much sophistication here though.

Rob Brackett (Apr 21 2021 at 02:18):

Oh, thanks for the Meta.source pointer. I remember seeing that earlier, and then totally forgot about it.

In many cases, we have "unknown" availability for a location.

I want to understand the intended semantics here. Is this for things like "source API is down right now, so we don't have any current data"? If so, it might be worth annotating the schedule rather than creating a placeholder slot.

We actually have 2 flavors of unknown to represent some nuance here:

We have no source (e.g. Dept. of Health knows about this location, but we don’t have a way to get detailed availability).
Our data source broke in some way (API is down, message format changed so we can’t parse, etc.).

Functionally, these are pretty similar for end users (there might be appointments, so you should check unless another location nearby has definite available slots). I think annotating the schedule would be totally reasonable in both cases (especially since either of these flavors of unknown might lead to us showing different, more generic schedules than we otherwise would — i.e. without product/dose info).

There’s actually a related case this reminded me of that’s not handled well: walk-in clinics. They don’t really have any slots (or there might be walk-ins allowed in addition to the appointment slots), so we probably need something modeled on the schedule level. (In our other APIs, we’re only talking about COVID, so we can annotate this data on the location itself, but that’s probably not reasonable here.)

In many cases, we only know a location has (no) available appointments at some time in the future

If there's no timing information known, we might again think about annotating the Schedule with this kind of information (e.g., a some-availability extension or a no-availability extension), since there's not much Slot-like to say.

:thumbs_up:

From an aggregator perspective, a workaround might be to populate a custom extension like capacity-unknown to indicate >0; would want to be clear that original publishers shouldn't annotate slots this way, though. Alternatively, we could switch our estimates from valueInteger to valueRange (so there could be a lower and upper bound), or capacity-min and capacity-max extensions. I'm worried about providing too much sophistication here though.

The range is pretty neat, but I think you’re right; that’s getting unnecessarily complex.

Rob Brackett (Apr 28 2021 at 01:56):

@Josh Mandel I begged off drafting spec language for these cases today, but I think I might have time to write them up tomorrow. Just to make sure we’re roughly on the same page:

Is it worth adding guidance about Meta.source to the spec? It’s already part of FHIR and is really only useful to aggregators, so it also seems fair if not.
An extension for slots with a url like "http://fhir-registry.smarthealthit.org/StructureDefinition/slot-capacity-unknown". It doesn’t need a value[whatever] field because the extension’s presence implies enough.
Thinking about it more, I’m not sure some-availability/no-availability is especially useful. While the timeframe wouldn’t be completely honest, an agggregator could publish a coarse-timeframe slot (e.g. covering the next 14 days) with the above slot-capacity-unknown extension. That would cut down on too many special cases in the spec and I think be just as clear. (Downside: there’s some data savings in putting this info on the schedule, since we don’t need to publish any slots for these schedules.)

Since we want to be extra careful to discourage first-party publishers from using these features, would it be better to add an appendix or separate section for aggregators and put these there, rather than directly in with the definitions for Locations, Schedules, and Slots? (Should we be so strict as to say first-party slot publishers SHALL not use these?)

Josh Mandel (Apr 28 2021 at 02:06):

Thanks @Rob Brackett! For (2) I just added https://github.com/smart-on-fhir/smart-scheduling-links/pull/35 before I saw this. I'd rather use a valueBoolean to give you the option of explicitly saying "no" (but omitting it is of course fine if the aim is to say "we don't know")

Rob Brackett (Apr 28 2021 at 02:09):

That’s covering (3) more than (2), right?

Josh Mandel (Apr 28 2021 at 02:09):

For (1) I think you can just populate Meta.source as you see fit. Happy to take a PR if you want to put this together, as an optional field for aggregators. There's a question of whether to populate it with a server URL (e.g., https://fhir.example.com) or a specific resource URL (e.g., https://fhir.example.com/Slot/123). I think the full resource URL makes it easier to trace/debug, if the data are coming from a SMART Scheduling Links implementation.

Josh Mandel (Apr 28 2021 at 02:10):

Oh yes, my PR was indeed about (3) :-)

Josh Mandel (Apr 28 2021 at 02:11):

I like the idea of putting this into an "Appendix for Aggregators", for sure. We could move the has-availability extension there too if you wind up putting a PR together for (1) or (2).

Rob Brackett (Apr 28 2021 at 02:13):

I think I got a bit mixed up with (2). Thinking back, did we wind up at: it probably isn’t a good idea to directly support (2) in order to discourage publishers from doing it, and consumers should not worry about distinguishing between "capacity=1 really means >0" and something that’s a real estimate?

Rob Brackett (Apr 28 2021 at 02:13):

At least that’s where we were somewhere in the middle of the discussion. That was probably when Keith popped in. :wink:

Josh Mandel (Apr 28 2021 at 02:16):

Yeah, that was where we got on the call -- I wouldn't mind having aggregator-specific guidance for this but don't think we'd want to trouble source publishers with it.

Rob Brackett (Apr 28 2021 at 02:17):

OK, so nothing to add for (2). For https://github.com/smart-on-fhir/smart-scheduling-links/pull/35/files, would you like me to update it by moving it to an aggregator’s appendix, or do that as a separate PR?

Josh Mandel (Apr 28 2021 at 02:19):

Whatever is easiest for you! (The commit history won't be too complicated here.)

Josh Mandel (Apr 28 2021 at 02:19):

Thanks for your help with this.

Rob Brackett (Apr 28 2021 at 02:21):

No problem. :)

Rob Brackett (Apr 28 2021 at 02:21):

Actually, since I don’t have write access, it’s probably better if I do the appendix as a follow-on PR.

Josh Mandel (Apr 29 2021 at 23:59):

Thanks for the PR -- I still owe you a response (on my TODO list)

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Data quality · smart/scheduling-links

Stream: smart/scheduling-links

Topic: Data quality

Josh Mandel (Apr 18 2021 at 01:56):

Rob Brackett (Apr 20 2021 at 21:17):

Josh Mandel (Apr 20 2021 at 23:45):

Rob Brackett (Apr 21 2021 at 02:18):

Rob Brackett (Apr 28 2021 at 01:56):

Josh Mandel (Apr 28 2021 at 02:06):

Rob Brackett (Apr 28 2021 at 02:09):

Josh Mandel (Apr 28 2021 at 02:09):

Josh Mandel (Apr 28 2021 at 02:10):

Josh Mandel (Apr 28 2021 at 02:11):

Rob Brackett (Apr 28 2021 at 02:13):

Rob Brackett (Apr 28 2021 at 02:13):

Josh Mandel (Apr 28 2021 at 02:16):

Rob Brackett (Apr 28 2021 at 02:17):

Josh Mandel (Apr 28 2021 at 02:19):

Josh Mandel (Apr 28 2021 at 02:19):

Rob Brackett (Apr 28 2021 at 02:21):

Rob Brackett (Apr 28 2021 at 02:21):

Josh Mandel (Apr 29 2021 at 23:59):