FHIR Chat · LOINC · terminology

Stream: terminology

Topic: LOINC


view this post on Zulip Gabriel Kleinoscheg (Mar 16 2022 at 09:28):

Is it possible to extract whole LOINC from tx.fhir.org via REST as FHIR CodeSystem resource?

view this post on Zulip Lloyd McKenzie (Mar 16 2022 at 13:48):

@Grahame Grieve @Rob Hausam

view this post on Zulip Rob Hausam (Mar 16 2022 at 13:55):

@Gabriel Kleinoscheg LOINC isn't managed as a CodeSystem resource instance on tx.fhir.org. Theoretically that could be done, but I doubt that anyone is doing it as it's going to be rather impractically large for handling as a resource instance. We don't generally expect the CodeSystem resource to be used for managing the contents for the larger code systems (e.g., SNOMED CT, LOINC, etc.).

view this post on Zulip Peter Jordan (Mar 16 2022 at 21:16):

Terminz serves CodeSystem resources for LOINC, SNOMED CT, etc. but with the content set to 'not-present' and, internally, these are created from classes which also contain methods to respond to operation requests. I would expect other TS providers to do something similar, particularly those who store the content in databases, rather than in-memory like tx.fhir.org. If a terminology server doesn't provide CodeSystem resource instances, then I'm guessing it has to provide information such as version, properties and filters in a TerminologyCapabilities resource?

view this post on Zulip Josh Mandel (Mar 16 2022 at 23:39):

FWIW this is the kind of thing @nicola (RIO/SS) manages with a concept table and I think being able to distribute the details in a standard fashion would be great.

view this post on Zulip Josh Mandel (Mar 16 2022 at 23:42):

Packing an entire large terminology into a CodeSystem is likely to present challenges (owing to size, perhaps memory constraints on parsing). But a Concept resource could help, or even just a long list of "CodeSystem Partial" resources (each conveying somewhere between a one concept and 10MB of concepts belonging to a single CodeSystem) as lines in an ndjson file would be pretty easy to parse.

view this post on Zulip Michael Lawley (Mar 17 2022 at 01:03):

We support some pretty big CodeSystem resources (~0.5G JSON) but if represented as ndjson would likely be much larger due to the need to repeatedly state things like the system URI for every Concept, along with the basic Resource boilerplate.
I think an extension that allowed for CodeSystem fragments to be stitched together might work as a kind of "CodeSystem Partial".
Also, the Concept resource would need to address code system versions

view this post on Zulip Josh Mandel (Mar 17 2022 at 01:43):

I don't think you'd want to repeat stuff in the context of this (half baked) "CodeSystem Partial" serialization scheme (though if you did, the repeating stretches compress super well, so .ndjson.gz file).

view this post on Zulip Michael Lawley (Mar 17 2022 at 01:59):

Sorry, re-reading my message I realise I left out key context. I meant if representing as a giant ndjson of hypothetical Concept resources, then it would be much larger than just a raw CodeSystem resource.

view this post on Zulip Josh Mandel (Mar 17 2022 at 02:09):

I'd focus on compressed size FWIW

view this post on Zulip Josh Mandel (Mar 17 2022 at 02:10):

size of decompressed file may be less important than the total memory requirements when reading it (i.e. how easy is it to steam)

view this post on Zulip Michael Lawley (Mar 17 2022 at 02:36):

Yep! 380k concept CodeSystem is 380MB of unformatted JSON but only 13M when gzipped. Not that that helps when parsed and the validator kicks in :)

view this post on Zulip Grahame Grieve (Mar 30 2022 at 01:43):

you could distributes them as a set of fragments

view this post on Zulip Grahame Grieve (Mar 30 2022 at 01:44):

vocab voted the Concept resource down. I rate that as one of our mistakes, but it was a big meeting so it'll be pretty hard to overturn

view this post on Zulip Grahame Grieve (Mar 30 2022 at 01:45):

but @Josh Mandel this whole area... be careful of the hammer / nail thing. Sure, it will help some people if you have code system resources for a few big code systems, but not most people since they have their own more efficient distribution based on their exact internal process/model

view this post on Zulip Josh Mandel (Mar 30 2022 at 01:59):

That is a very fair and wise cautionary note! To be clear I have no interest in changing or challenging the efficient distribution processes that already exist; I'd just love to have some well defined efficient distribution processes for greenfield/newcomers/my-own-self.

view this post on Zulip Grahame Grieve (Mar 30 2022 at 02:00):

well, my strong advice for such newcomers is : use a terminology server and figure out how to incorporate the services it provides into your framework. They do all sorts of genuinely hard things, and the API exists to serve you

view this post on Zulip Josh Mandel (Mar 30 2022 at 02:08):

I think using one could mean building one (for many deployments scenarios), so commoditizing these inputs makes good sense.

view this post on Zulip Michael Lawley (Mar 30 2022 at 07:34):

Why build when you can buy :-)

view this post on Zulip Craig McClendon (Mar 30 2022 at 17:38):

I can stand up any number of FHIR servers that have varying terminology capabilities. Actually finding, downloading, manipulating, and loading a bunch of different code systems' data into any one is difficult work. Not to mention maintaining them afterwards.

Having a standard, generic format for distribution/import/export of codesystems seems like it would be a good thing for all involved.

view this post on Zulip Josh Mandel (Mar 30 2022 at 18:38):

Yes, Craig said what I was trying to -- and much more eloquently.

view this post on Zulip Grahame Grieve (Mar 30 2022 at 20:29):

Having a standard, generic format for distribution/import/export of codesystems seems like it would be a good thing for all involved.

well, we do have one, and we use it as much as we can. There's just a few very big code systems that are widely supported where we don't.

view this post on Zulip Craig McClendon (Mar 30 2022 at 21:11):

What about a hybrid ndjson structure where the first row is a CodeSystem (sans codes) and subsequent rows are CodeSystem.concept BackboneElement JSONs? You would need to use CodeSystem.concept.property to denote hierarchical relationships and not CodeSystem.concept.concept for this to work.
I could see adding custom operations to a FHIR server that could export/import such structures.

view this post on Zulip Grahame Grieve (Mar 30 2022 at 21:19):

but why would that better? Sounds way bigger and less efficient to me

view this post on Zulip Michael Lawley (Mar 30 2022 at 21:21):

So, we're looking for a standard rendering, JSON-based with a large set of concept objects and some metadata for the code system itself, and then we're going to transform all the random other formats into it and load into a terminology server - that looks a hell of a lot like a FHIR CodeSystem to me. Furthermore, it is exactly what we do for Ontoserver, in production, and it scales up to really large things like dm+d.

Wrt an ndjson form, I don't really understand the advantage of stripping off two []s and moving a } which is pretty much what it would amount to.

view this post on Zulip Craig McClendon (Mar 30 2022 at 21:25):

The advantage I was thinking of is that ndjson is more amenable to streaming rather than transmitting the whole thing in one go.

view this post on Zulip Brian Postlethwaite (Mar 30 2022 at 21:25):

Isn't this issue just the using of streaming serializers on the json (or XML) rather than document based processing?
(to resolve the issue of the space in memory required)
the Firely SDK has streaming options on the XML (haven't checked the json, but know that it's possible)
they just don't validate while they're going - which I don't think any of the suggestions here is proposing either.

view this post on Zulip Josh Mandel (Mar 30 2022 at 22:37):

As I describe at https://github.com/jmandel/fhir-concept-publication-demo#fhir-concept-publication-demo, I think supporting a standard FHIR CodeSystem is a great idea. There is also some value in having a simple "shell" and a pointer to a pile of concepts, so the "shell" can be posted to a server in a lightweight way, and the concepts fetched in the background. In any case these details are a bit beside the point (the point is to have the content ready and available for all the code systems people care about).

(Re: streaming processing, https://github.com/jmandel/fhir-concept-publication-demo#fhir-concept-publication-demo possible to do with arbitrary JSON but trivial with newlines delimiting objects.)

view this post on Zulip Grahame Grieve (Mar 30 2022 at 22:40):

I'm with Michael in failing to understand why moving some [] makes such a big difference, and what happens to nesting?

view this post on Zulip Grahame Grieve (Mar 30 2022 at 22:41):

There is also some value in having a simple "shell" and a pointer to a pile of concepts, so the "shell" can be posted to a server in a lightweight way, and the concepts fetched in the background.

sounds like you're describing fragments

view this post on Zulip Josh Mandel (Mar 30 2022 at 22:44):

Again, let's leave the ndjson question aside. As I said, there should be a standard resource in any case.

view this post on Zulip Josh Mandel (Mar 30 2022 at 22:45):

I don't know what "fragments" means; can you clarify?

view this post on Zulip Josh Mandel (Mar 30 2022 at 22:45):

(a Google search turns up https://build.fhir.org/ig/HL7/UTG/external_terminologies_csf.html but that didn't help me.)

view this post on Zulip Josh Mandel (Mar 30 2022 at 22:47):

(to me, "fragments" sounds less than complete; I'm hoping for something complete.)

view this post on Zulip Grahame Grieve (Mar 30 2022 at 22:53):

http://build.fhir.org/codesystem-codesystem-content-mode.html#codesystem-content-mode-fragment

"fragments" sounds less than complete; I'm hoping for something complete.

and here was me thinking you were asking for something incomplete because complete is too big. I just don't see a general case here. Seems like you're just 'fixing' RxNorm.

view this post on Zulip Josh Mandel (Mar 30 2022 at 23:11):

I'm not following

view this post on Zulip Josh Mandel (Mar 30 2022 at 23:12):

I'm trying to avoid the repeated work of everyone having to wrangle source formatted data into FHIR, by having a good, consistent pile of ready-made FHIR content that expresses concept designations, properties, and relationships.

view this post on Zulip Josh Mandel (Mar 30 2022 at 23:13):

This is not about providing a "curated subset" of concepts.

view this post on Zulip Grahame Grieve (Mar 30 2022 at 23:40):

the general case is fhir core, hl7.terminology, and fhir.tx.support - anything open should already be in one of those, and contributions are welcome.

What's left appears to be the 3 you talked about, which are too big when represented as code system, or UCUM, and several IETF terminologies, which are too complex to represent as code systems in a useful way

view this post on Zulip Grahame Grieve (Mar 30 2022 at 23:40):

and propretary terminologies which are someone else's problem.

view this post on Zulip Grahame Grieve (Mar 30 2022 at 23:42):

As for

I'm trying to avoid the repeated work of everyone having to wrangle source formatted data into FHIR

I feel that you're still in the hammer/nail space and you don't yet fully understand the complexities you will shipwreck on in the future, since you haven't really listened to me about API instead of JSON. I didn't say that for nothing

view this post on Zulip Michael Lawley (Mar 31 2022 at 00:23):

Our strategy at CSIRO for this has been to engage with the owners of the code systems where possible and convince them to take ownership of publishing their content in FHIR CodeSystem form (which is in their interest for consistency and uptake), and otherwise to develop appropriate transforms that do the conversion -- see https://github.com/aehrc/fhir-tx-transforms https://github.com/aehrc/fhir-owl https://github.com/aehrc/fhir-hgnc as examples

Note that a FHIR CodeSystem for SNOMED CT will only ever be of limited value because a SNOMED release also includes data for (in FHIR parlance) ValueSets and ConceptMaps, and then there's post coordination and ECL to deal with.

view this post on Zulip Josh Mandel (Mar 31 2022 at 00:29):

I feel like my perspective is getting lost in translation here. Is it helpful if I re-frame as "what's the redundant work that every [terminology] server developer winds up repeating? What does it look like to shrink this?"

Of course terminology releases might require CodeSystem and ValueSet resources to properly represent them; that's still representable as a standardized pile of files. Having upstream owners publish in a standardized way is ideal; closing the gaps with openly maintained community scripts/mappings in the meantime may be a good way to address the long tail.

view this post on Zulip Michael Lawley (Mar 31 2022 at 00:37):

I think the position that @Grahame Grieve and I are coming from is that there doesn't need to be a large number of "terminology server developers"; you just need to pick a sufficiently good existing implementation and use its API.
Rephrasing to answer your question, the redundant work is implementing terminology server semantics rather than using the existing APIs to support the terminology semantics required by _search. Ultimately, this means that developers need to understand how to optimise these interactions, especially with respect to :in. Fortunately that mostly resolves down to cache maintenance.

view this post on Zulip Michael Lawley (Mar 31 2022 at 00:39):

I do think there's value and a sweet spot in defining a simple ECL-alike for FHIR ValueSets so that it can be easily packed into an implicit ValueSet URI, but it's a bonus, not a prerequisite for the above.

view this post on Zulip Grahame Grieve (Mar 31 2022 at 00:49):

"what's the redundant work that every [terminology] server developer winds up repeating? What does it look like to shrink this?"

sure. you can do that. but the way I implement this is to have an internal API and then multiple providers that provide code system services based on whatever native form is available. And the process of loading the content from whatever form is available is a small fraction of the work I do, and standardising on an inefficient format wouldn't help me with that. What I really need to do is the code system specific logic. Which is what my point about shipwrecks above is

view this post on Zulip Grahame Grieve (Mar 31 2022 at 00:51):

rxnorm is somewhat unique for me because I simply load it and map it directly. For all the other code systems, I do much more work than that.

view this post on Zulip Josh Mandel (Mar 31 2022 at 00:57):

Are there examples of the kind of work you do for LOINC?

view this post on Zulip Grahame Grieve (Mar 31 2022 at 01:11):

looking through my code... mostly cross-indexing, gathering all the files into a single set of tables and sorting the langauge stuff out

view this post on Zulip Grahame Grieve (Mar 31 2022 at 01:11):

less logic than I thought

view this post on Zulip Craig McClendon (Mar 31 2022 at 17:26):

Interesting. I guess I've found wrangling source codesystems into a FHIR server to more painful than some of you.
I'm not talking about even developing a terminology server, just loading them into an existing server.

Here is a list of the sizes (concept count) of various larger terminologies I have (many pulled from Athena, some loaded in other ways) which arguably don't fit well into a single CodeSytem document.

RxNorm Extension :: 2095063
NDC :: 1086608
SNOMED :: 1035027
RxNorm :: 300300
LOINC :: 258836
OSM :: 203339
ICD10PCS :: 194981
Gene Variant :: 121446
OMOP Genomic :: 120991
ICD10CM :: 97114
ICDO3 :: 64471
ICD9CM :: 17564
ICD10 :: 16519
Gene :: 13591
HCPCS :: 10793
ClinVar :: 8072
ATC :: 6740
Variant by Impact :: 4846
ICD9Proc :: 4657
Gene by Impact :: 2466
CDM :: 1045
UCUM :: 1008

If I could "zap" all these out to file with a utility or custom operation in a common format, and zap them into another server - I would find that very useful - for new deployments, test environments, etc.
Moreso if they were published from the source in a common format so I could add new ones and update existing with minimal effort. I don't quite understand the hesitancy here.

view this post on Zulip Rob Hausam (Mar 31 2022 at 18:02):

Several of these, like ICD-10-CM, ICD-10-PCS and ATC (and likely some others), we do have as FHIR CodeSystem resource instances (on tx.fhir.org).

view this post on Zulip Grahame Grieve (Mar 31 2022 at 20:01):

you're welcome to contribute any of these to fhir.tx.support that you can, as CodeSystem resources. But you've only counted concepts, and not factored in properties and alteranate designations, which also matter, so fhir.tx.support doesn't include the top few CodeSystems.

Which existing server are you talking about?

view this post on Zulip Michael Lawley (Apr 01 2022 at 04:01):

Something strange going on here with dates and versions?
image.png

view this post on Zulip Grahame Grieve (Apr 01 2022 at 04:18):

that's pretty strange

view this post on Zulip Grahame Grieve (Apr 01 2022 at 04:19):

but I can tell you what happened

view this post on Zulip Grahame Grieve (Apr 01 2022 at 04:20):

the date given is the date that the package was uploaded. And back in January we did a round of work on the packages (see #tooling, I think) and packages that hadn't been uploaded got uploaded

view this post on Zulip Michael Lawley (Apr 01 2022 at 04:23):

I guessed it would be something like that - technical date vs "business date"

view this post on Zulip Robert McClure (Apr 01 2022 at 14:25):

Catching up: A couple of interesting threads in this.

Michael Lawley said:

I think the position that Grahame Grieve and I are coming from is that there doesn't need to be a large number of "terminology server developers"; you just need to pick a sufficiently good existing implementation and use its API.
Rephrasing to answer your question, the redundant work is implementing terminology server semantics rather than using the existing APIs to support the terminology semantics required by _search. Ultimately, this means that developers need to understand how to optimise these interactions, especially with respect to :in. Fortunately that mostly resolves down to cache maintenance.

I agree we need to have robust common-access FHIR terminology servers that can be used by all so that users of all types can view, directly reference (IE: include in IGs), access run-time, and I'd assume utilize in production systems, any content they have IP rights to (so we need to have a way to confirm this.) This scenario needs to be one of our goals.

Michael Lawley confirms our alignment on this goal by pointing out that one of the unmet requirements to support this when he said:

I do think there's value and a sweet spot in defining a simple ECL-alike for FHIR ValueSets so that it can be easily packed into an implicit ValueSet URI, but it's a bonus, not a prerequisite for the above.

We need to keep in mind that FHIR code system representation is not optimized for complexity and as I see it the API-orientation will always be focused on making terminology content available for use, not for internal storage, sharing, complete representation.

BUT I agree with other sentiments that practically developers, IG creators, and many others will need to stand up their own terminology servers, purchased or not. That does mean that improvements in how we package and consume seem like a worthwhile activity. But will volunteers make it happen?

view this post on Zulip Grahame Grieve (Apr 01 2022 at 20:08):

well, anyone can stand up a clone of tx.fhir.org. That's something that volunteers made happen


Last updated: Apr 12 2022 at 19:14 UTC