Stream: ontology
Topic: common API
Erich Schulz (Jun 02 2016 at 11:33):
I'm starting to think about reusable software componenents (see also the other thread in implementers stream) and it seems that most of the interesting common operations will require injection of some kind of ontology service. This has me wondering if any thought has gone into defining a standard API for such a service?
Grahame Grieve (Jun 02 2016 at 11:34):
what do you think it would do, and why would FHIR define such a thing? surely that's a core ontology w3C thing?
Erich Schulz (Jun 02 2016 at 11:34):
(I'm thinking to identify simple, common operations with no external dependencies to explore initially - but it should also be possible work with an injected service)
Erich Schulz (Jun 02 2016 at 11:36):
well a common operation would be a apply a mapping eg given a problem list with mainly SNOMED codes, generate a list in ICD 10 code
Michael van der Zel (Jun 02 2016 at 11:36):
SparQL like?
Erich Schulz (Jun 02 2016 at 11:36):
also classify the problems by body system...
Grahame Grieve (Jun 02 2016 at 11:37):
Erich, you should start by reading the terminology service
Erich Schulz (Jun 02 2016 at 11:37):
maybe @Michael van der Zel ...
Erich Schulz (Jun 02 2016 at 11:38):
is that a FHIR thing @Grahame Grieve ?
Erich Schulz (Jun 02 2016 at 11:38):
(sorry appreciate this is bit of a noob question)
Grahame Grieve (Jun 02 2016 at 11:38):
yes http://hl7-fhir.github.io/terminology-service.html
Erich Schulz (Jun 02 2016 at 11:39):
bingo! thanks @Grahame Grieve
Erich Schulz (Jun 02 2016 at 11:41):
i'd looked quickly at the https://www.hl7.org/fhir/valueset.html and related resource but had missed this page
Erich Schulz (Jun 02 2016 at 11:43):
is there a list of implementations ?
Grahame Grieve (Jun 02 2016 at 11:45):
my server. Ontoserver, Apelon. IMO. NLM. We're starting to prepare for certification of the services
Erich Schulz (Jun 02 2016 at 11:49):
wow
Erich Schulz (Jun 02 2016 at 11:53):
ok so then (thinking in javascript sorry) it should be possible to define an API as a lightweight wrapper around this (is it REST?) service and then this service could be an injectable dependency into a library that performs "common simple operations" on FHIR data... ?
Grahame Grieve (Jun 02 2016 at 11:53):
yep
Erich Schulz (Jun 02 2016 at 11:53):
sweeet!
Peter Jordan (Jun 02 2016 at 21:23):
I look forward to seeing the certification criteria for FHIR-based Terminology Services. Will the Connectathon Tests form the basis of these requirements?
Grahame Grieve (Jun 02 2016 at 21:24):
well, that's the process that will lead towards the certification tests
Erich Schulz (Jun 03 2016 at 10:33):
I had a read of http://hl7-fhir.github.io/terminology-service.html now
Erich Schulz (Jun 03 2016 at 10:34):
it gets a bit wooly around the "closure" table
Erich Schulz (Jun 03 2016 at 10:34):
(we used to call them "ancestor tables")
Erich Schulz (Jun 03 2016 at 10:36):
I guess I should have a hard poke at the onto server - then I maybe up for a few patches
Grahame Grieve (Jun 03 2016 at 10:36):
closure we haven't tested yet. Getting there
Erich Schulz (Jun 03 2016 at 10:38):
idea is for client to incrementally build its own table holding all possible is-a links between a set of "codes of interests"?
Grahame Grieve (Jun 03 2016 at 10:44):
yes, client knows the table, but not the grounds on which it is built - that's what the terminology server knows. My server supports closure, but it's the only one at this time
Erich Schulz (Jun 03 2016 at 10:45):
ah
Erich Schulz (Jun 03 2016 at 10:45):
so this depends on the server keeping a track of which links it has let the client know about?
Grahame Grieve (Jun 03 2016 at 10:46):
y
Erich Schulz (Jun 03 2016 at 10:46):
that could get expensive with a lot of clients...
Grahame Grieve (Jun 03 2016 at 10:47):
well, up to the server to decide how to manage that. I'll let users fill up my database, and then I'll just wipe it ;-)
Erich Schulz (Jun 03 2016 at 10:47):
heh
Grahame Grieve (Jun 03 2016 at 10:48):
of course if a user runs my server locally, they can have whatever policy they want
Erich Schulz (Jun 03 2016 at 10:50):
btw are we sure "closure" is the correct term?
Erich Schulz (Jun 03 2016 at 10:51):
https://en.wikipedia.org/wiki/Closure_(computer_programming)
Grahame Grieve (Jun 03 2016 at 10:51):
yes. see http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
Erich Schulz (Jun 03 2016 at 10:54):
yeah i read that link
Erich Schulz (Jun 03 2016 at 10:54):
actually this seems the derivation
Erich Schulz (Jun 03 2016 at 10:54):
https://en.wikipedia.org/wiki/Transitive_closure
Grahame Grieve (Jun 03 2016 at 10:54):
y
Peter Jordan (Jun 03 2016 at 10:58):
At the Montreal Connectathon, Caroline Macumber from Apelon suggested that they may have implemented the full closure operation, but was going to check with the relevant developer. I started a while back, but wasn't comfortable that I understood the complete use case - notably around notifying and updating clients when the server rebuilds a transitive closure table after a new version of the code system is implemented. A UML Sequence Diagram might be useful; I re-read the spec and Grahame's blog entry several times - but something seemed to be missing.
Erich Schulz (Jun 03 2016 at 10:59):
so it seems that "closure" actually just means the property of generating a finite set (in mathematics)
Erich Schulz (Jun 03 2016 at 11:00):
the key element is that this based on the transitive is-a
links...
Erich Schulz (Jun 03 2016 at 11:02):
too be honest I'm thinking that supporting incremental creation of these "is-a closure tables" (?? IsACT ??) via client-server operation is "in the 80%"
Erich Schulz (Jun 03 2016 at 11:03):
but I can certainly see the utility in the base operation of "give me the subset of is-a links that involve members both in a given set"
Grahame Grieve (Jun 03 2016 at 11:03):
? anyone who wants to use a terminology server to handle all their terminology logic will be lead kicking and screaming to maintaining a closure table
Erich Schulz (Jun 03 2016 at 11:04):
take hiearchy table => explode
Grahame Grieve (Jun 03 2016 at 11:04):
sorry, anyone who maintains a relational database who....
Erich Schulz (Jun 03 2016 at 11:06):
its a simple enough operation...
Peter Jordan (Jun 03 2016 at 11:09):
I persist SNOMED CT in SQL Server and have a transitive closure table; but, to date, profiling and execution plans show no discernible performance difference between the table-valued function that I used for subsumption queries using the core tables, and the one that uses the closure table. However, I suspect that might change when the server is under a heavy load.
Grahame Grieve (Jun 03 2016 at 11:12):
the key difference is that the closure table built this way is capable of dealing with post-coordination etc
Rob Hausam (Jun 03 2016 at 11:12):
@Peter Jordan can you describe your "table-valued function that I used for subsumption queries using the core tables" vs. the "closure table"?
I'm not quite sure what the former one is - it sort of sounds like a "closure table" under the hood (or bonnet):)
Peter Jordan (Jun 03 2016 at 11:13):
That makes sense, but I don't support post-co-ordination...yet.
Erich Schulz (Jun 03 2016 at 11:13):
@Grahame Grieve - ok that isn't as simple :-)
Grahame Grieve (Jun 03 2016 at 11:14):
it is for the client, that's the key - it's just a code. It's not interested in the internal details. code, closure table, whatever...
Erich Schulz (Jun 03 2016 at 11:14):
how common is post-cordination out there in the real world?
Grahame Grieve (Jun 03 2016 at 11:15):
uncommon. mostly because of the closure table problem.
Grahame Grieve (Jun 03 2016 at 11:15):
because everyone pre-generates their closure tables, if they have them, and then they can't deal with post-coordination
Rob Hausam (Jun 03 2016 at 11:16):
we especialy haven't explored the post-coordination aspects yet, as far as I know
and I was going to say, I think that presumes that the post-coordinated expression will be assigned an identifier (on the fly)? - which is the SNOMED CT idea of the "expression library"
Grahame Grieve (Jun 03 2016 at 11:16):
not necessary for the client. Or the API. Server might decide to do that for itself, but that's it's business
Erich Schulz (Jun 03 2016 at 11:17):
does the ontoserver to a basic closure (still think a better name is needed) operation?
Grahame Grieve (Jun 03 2016 at 11:18):
don't know whether Michael and the team have done that
Erich Schulz (Jun 03 2016 at 11:19):
i'm getting curious about how long it would take my PC (2years old with 16G ram) too generate an ancestor table on snomed...
Erich Schulz (Jun 03 2016 at 11:19):
and then to get postgress to load it...
Grahame Grieve (Jun 03 2016 at 11:20):
depends. generating the closure table for sct-us purely in ram takes me 45 seconds
Peter Jordan (Jun 03 2016 at 11:20):
@Rob Hausam, I'm not sure if you're familiar with table-valued functions in SQL Server - they're a special type of stored procedure that return a table. The relevant logic uses self-joins on the relationship table, I'd be quite happy to send it to you. Because it uses a larger table than the 2-column transitive closure one, I suspect that it (and/or the relevant indexes) might take up more memory so might be more likely to slow down when SQL Server is nearing its memory peak.
Erich Schulz (Jun 03 2016 at 11:23):
gad I'm not getting any work done...
Rob Hausam (Jun 03 2016 at 11:23):
yes, I am somewhat familiar with them in general, Peter - but I've tended to do most of my db work in Oracle
I would like to have a look at it
Erich Schulz (Jun 03 2016 at 11:24):
this is too interesting
Erich Schulz (Jun 03 2016 at 11:24):
we used to use oracle in the 90's because it had connect by
Erich Schulz (Jun 03 2016 at 11:25):
which was way too slow compared with a "transitive closure" table
Peter Jordan (Jun 03 2016 at 11:26):
Using the Pearl Script supplied by IHTSDO, it takes less than a minute to generate the closure table for the international edition of SCT - on my 64 bit server with lots of RAM. It takes at least 5 mins on my 32 bit test machine.
Rob Hausam (Jun 03 2016 at 11:26):
yes, I think "connect by" is a great feature - I've made extensive use of it for this sort of thing
like building a "closure table" (althought I didn't call it that at first) on the fly ("just in time")
Erich Schulz (Jun 03 2016 at 11:27):
but looking at this... http://hl7-fhir.github.io/terminology-service.html @Grahame Grieve I am thinking the "incremental build" component of the "closure" operation just seems erm non-core ?
Erich Schulz (Jun 03 2016 at 11:28):
a certainly see the generation of the initial table as a core feauture tho...
Rob Hausam (Jun 03 2016 at 11:30):
@Erich Schulz yes, the pre-computed full closure table is significantly faster than doing a hierarchical query each time - that's why we adopted the caching approach which builds the closure table as needed, rather than pre-computing all of it, most of which will never be used
just a different approach
Erich Schulz (Jun 03 2016 at 11:30):
yes I can see the rationale for serving a subset
Erich Schulz (Jun 03 2016 at 11:33):
it was the incremental building of a subsets I was questioning...
Erich Schulz (Jun 03 2016 at 11:35):
(at least as part of core API )
Erich Schulz (Jun 03 2016 at 11:36):
mainly because it imposes a burden on the server to track its previous messages to clients in a pattern that may not scale awfully well and introduces a bunch of complexities around "time to live"
Erich Schulz (Jun 03 2016 at 11:53):
I'm having a look at ontoserver @Grahame Grieve - it looks way more cut-down than http://hl7-fhir.github.io/terminology-service.html
Peter Jordan (Jun 03 2016 at 11:53):
The transitive closure table generated from the 20160131 snapshot version of the SCT International Edition has 5,470,090 rows!
Erich Schulz (Jun 03 2016 at 11:54):
so if was stored as pair of 64bit bytes...
Erich Schulz (Jun 03 2016 at 11:54):
that is ~25M
Grahame Grieve (Jun 03 2016 at 12:00):
I dont't think ontoserver is much cut down. It does everything that we've worked through so far
Peter Jordan (Jun 03 2016 at 12:01):
Storage is cheap, it's all about memory. The table has 2 columns of SCTIDs - IHTSDO recommend that these are persisted as 64 bit integers, but as they aren't true numbers I store them as chars and I know others who do likewise.
Grahame Grieve (Jun 03 2016 at 12:01):
I think that the incremental closure table is wroth having. it is more work for the server, but we've always said that's a good deal for the client
Grahame Grieve (Jun 03 2016 at 12:02):
I store the closure table as an array of pairs of 4 byte unisgned int, which each 4 bytes is a look up to an array of string values that represent the codes
Erich Schulz (Jun 03 2016 at 12:02):
are many systems doing this currently?
Grahame Grieve (Jun 03 2016 at 12:02):
but I don't use a database for this.
Grahame Grieve (Jun 03 2016 at 12:03):
not many, but all the terminology servics have had to do something about this problem in order to support integrated search across terminologies and other things
Peter Jordan (Jun 03 2016 at 12:03):
...but what's the trigger for clients to request updates when the server refreshes the closure table?
Erich Schulz (Jun 03 2016 at 12:04):
given it is a 45 second operation to rebuild the entire thing and releases come out once a month? (max) I'm struggling to see a high ROI...
Grahame Grieve (Jun 03 2016 at 12:05):
because you can't build it in advance
Erich Schulz (Jun 03 2016 at 12:05):
(just to emphasis am talking about incremental builds... not subseting... subsetting is gold)
Grahame Grieve (Jun 03 2016 at 12:05):
unless you prohibit post-coordination. which everyone does, but it cripples the tx
Erich Schulz (Jun 03 2016 at 12:06):
do you have link for ontoserver subsumption test?
Grahame Grieve (Jun 03 2016 at 12:07):
the $validate option... what link do you want?
Peter Jordan (Jun 03 2016 at 12:08):
Things will be trickier if/when IHTSDO move to more frequent SCT releases.
Erich Schulz (Jun 03 2016 at 12:08):
mmm
Grahame Grieve (Jun 03 2016 at 12:09):
it's a solvable problem. I've got to write a client to help the ontoserver guys test their closure table
Erich Schulz (Jun 03 2016 at 12:09):
i guess I'm just flagging 80:20 situation...
Erich Schulz (Jun 03 2016 at 12:09):
perhaps opportunity to have inititial simple API definition then a second wave?
Grahame Grieve (Jun 03 2016 at 12:10):
then you don't follow what the 80:20 is about.
Erich Schulz (Jun 03 2016 at 12:13):
?
Erich Schulz (Jun 03 2016 at 12:13):
http://www.clipular.com/c/4745096286699520.png?k=oAARjjxSNY7fDU3cgP6HeFBTC2E
Grahame Grieve (Jun 03 2016 at 12:14):
all the terminology servers implement some kind of feature for managing a closure table
Erich Schulz (Jun 03 2016 at 12:14):
sure
Erich Schulz (Jun 03 2016 at 12:14):
I can only repeat...
Erich Schulz (Jun 03 2016 at 12:15):
transitive closure table = gold
Erich Schulz (Jun 03 2016 at 12:16):
incremental building of client-server building of TCT is in a different plane
Erich Schulz (Jun 03 2016 at 12:16):
not saying its bad...
Grahame Grieve (Jun 03 2016 at 12:16):
then you don't need to consume it. nor worry about it.
Erich Schulz (Jun 03 2016 at 12:17):
just saying that today its only implemented by a single server...
Erich Schulz (Jun 03 2016 at 12:18):
sorry not trying to be difficult
Erich Schulz (Jun 03 2016 at 12:19):
the concern is I go looking for the resource on the ontoserver and it isn't there
Erich Schulz (Jun 03 2016 at 12:19):
even the base resource as far as I can see
Grahame Grieve (Jun 03 2016 at 12:19):
which base resource?
Erich Schulz (Jun 03 2016 at 12:19):
for a transitive closure table
Grahame Grieve (Jun 03 2016 at 12:20):
you mean ConceptMap?
Erich Schulz (Jun 03 2016 at 12:21):
they have the transitive closure table expressed with conceptmap??
Grahame Grieve (Jun 03 2016 at 12:21):
concept map is part of it, but I wasn't sure what you meant. But they haven't implemented closure API yet
Erich Schulz (Jun 03 2016 at 12:21):
k
Erich Schulz (Jun 03 2016 at 12:22):
so what I'm thinking is if the base spec is simple then that implementation can occur more rapidly
Erich Schulz (Jun 03 2016 at 12:22):
because I can write a script to make my own
Grahame Grieve (Jun 03 2016 at 12:22):
well, simple problems are simple. yes. And a client can just do subsumption testing directly, no problems
Peter Jordan (Jun 03 2016 at 12:26):
...and store the results each time it requests a new test...and (possibly) be aware (somehow) when the server loads a new version of the code system that might make its cached results redundant. Which begs the question, why does the server need to maintain a record of client closures?
Erich Schulz (Jun 03 2016 at 12:29):
i'm thinking it could be useful to have a table of the current servers and the services they provide...
Grahame Grieve (Jun 03 2016 at 12:29):
if the server doesn't know what subset the client is dealing with, it must return the close for everything, but everything is not finite
Erich Schulz (Jun 03 2016 at 12:30):
yes that is true
Erich Schulz (Jun 03 2016 at 12:30):
so somehow the client needs to identify the subset it is interested in
Erich Schulz (Jun 03 2016 at 12:30):
i agree that is core functionality
Grahame Grieve (Jun 03 2016 at 12:31):
well, it can build it gradually, or it can accelerate the process
Erich Schulz (Jun 03 2016 at 12:31):
what I'm suggesting may not be core is incremental expansion of the TCT...
Erich Schulz (Jun 03 2016 at 12:31):
the key words being "incremental expansion"
Peter Jordan (Jun 03 2016 at 12:40):
I'm still missing something here. Although I fully understand why a client needs to persist the results of individual subsumption queries, I can't see the need for a server to (effectively) maintain a record of those results - what value does that add, e.g. does it make it any easier for the client to be made aware when those results may be superseded by a code system update? What's the value proposition for this additional complexity?
Erich Schulz (Jun 03 2016 at 12:41):
it would save some bandwidth...
Rob Hausam (Jun 03 2016 at 12:49):
the key point is how the client becomes aware that it needs to rebuild its stored transitive closure subset, because the code system has been updated
Erich Schulz (Jun 03 2016 at 12:49):
an expiry date? or a "last-updated" service?
Erich Schulz (Jun 03 2016 at 12:59):
there are also tools like rsync and git that eat incremental updating for breakfast
Erich Schulz (Jun 03 2016 at 13:00):
building this functionality into the core services seems to violate the "do one thing well" principle
Peter Jordan (Jun 03 2016 at 21:20):
@Rob Hausam that key point applies to all use cases where a client persists the results of SNOMED CT queries. @Erich Schulz - SNOMED CT updates can either add or inactivate concepts and relationships - versioning is based on the active/inactive status at the specified release date (effective time). Concepts can be activated, deactivated and reactivated - but never deleted.
From a client perspective, I'd persist the relevant CodeSystem version and periodically check it with the one returned by the Terminology Server. When there's a new version, I'd refresh all the persisted query results - subsumptions or otherwise. Therefore, from an EHR/EMR service perspective, I still don't see a use case for a separate, and distinct, process for closure subsets. However, it would be informative to understand the requirements of other categories of terminology service client.
Rob Hausam (Jun 03 2016 at 22:03):
I agree, Peter. I think your suggestion is reasonable, and similar to what I've implemented before (although with an "internal" terminology server, rather than an external API). I think you may be right about the need, and the means of achieving it. Yes, we need to consider all (known) client perspectives. I've been intending to spend some time looking at $closure in greater depth, and this discussion is good impetus for doing that.
Last updated: Apr 12 2022 at 19:14 UTC