FHIR Chat · Relational Data representation & aggregation · implementers

Stream: implementers

Topic: Relational Data representation & aggregation


view this post on Zulip dsh (Nov 18 2019 at 19:52):

If I store Observation like vitals in a relational database i.e. a Vitals table with columns
like systolic, diastolic, o2_saturation, head_circumference, respiratory_rate etc with
timestamps columns (created_at, updated_at) and foreign keys (patient_id, hospital_id)

And I receive one Observation resource per vital sign i.e. one reading (systolic only)
in one Observation resource and one reading (oxygen saturation only) in another Observation and so on.

Then I am storing possibly one attribute per record say systolic in one record,
diastolic in another record, respiratory_rate in another record and
oxygen saturation in another record. So I will have total of 4 records. Like this

Diastolic Systolic O2_Saturation Respiratory_Rate Created_At
90 2019-11-18 17:56:25.210201
132 2019-11-18 17:56:55.220904
39 2019-11-18 17:57:05.230107
71 2019-11-18 17:58:45.240203

What is the best way then to aggregate these 4 records in to "1 record" showing
latest vitals, like this

Diastolic Systolic O2_Saturation Respiratory_Rate Created_At
90 132 71 39 2019-11-18 17:58:45.240203

view this post on Zulip dsh (Nov 18 2019 at 19:52):

The above ^ is obviously simplified for purposes of asking this question, but in real world there will be more columns and bigger spread of these fragments of data that would need aggregation.

view this post on Zulip dsh (Nov 18 2019 at 19:54):

Or is there a better way to store and retrieve data in Observation resource

view this post on Zulip Grahame Grieve (Nov 18 2019 at 19:54):

well, that depends. why are you aggregating, and what are you doing about the differing time stamps (and other metadata?)

view this post on Zulip dsh (Nov 18 2019 at 19:55):

well, that depends. why are you aggregating, and what are you doing about the differing time stamps (and other metadata?)

I am aggregating the data because the end user (clinicians) want to view the latest Vitals

view this post on Zulip dsh (Nov 18 2019 at 19:55):

The different timestamps per field/column is another concern to represent in the view

view this post on Zulip Grahame Grieve (Nov 18 2019 at 19:57):

it seems to me to be bad data base management to collapse these things into a sparsely populated or loosely tracked table with time uncertainties. I'd have a single table with code/value... mostly that's what we all do (but by no means all of us)

view this post on Zulip dsh (Nov 18 2019 at 20:02):

@Grahame Grieve You are right about bad schema here. So in your opinion it will be better to represent Vitals table as

Code Value Created_At
8383-53 132 2019-11-18 17:58:45.240203
8112-21 90 2019-11-18 17:56:45.240203

view this post on Zulip dsh (Nov 18 2019 at 20:12):

I'd have a single table with code/value... mostly that's what we all do (but by no means all of us)

@Grahame Grieve How would you convert these code/value columns into humanly understandable format like

Diastolic Systolic O2_Saturation Respiratory_Rate
90 132 71 39

view this post on Zulip dsh (Nov 18 2019 at 20:12):

Any pointers will be much appreciated

view this post on Zulip Grahame Grieve (Nov 18 2019 at 20:14):

code/value table with a important fields code, date/time, value, units

view this post on Zulip dsh (Nov 18 2019 at 20:21):

Yes I agree units and date/time are important too

Though I am having hard time imagining an simple transposition of these records into a different Object representation like

Diastolic Systolic O2_Saturation Respiratory_Rate
90 132 71 39

The Object boundary being represented by the last attribute which in real world may be the 10th or 15th attribute to fill all those attributes at least 15 records will be read

view this post on Zulip Grahame Grieve (Nov 18 2019 at 20:22):

right.

view this post on Zulip Lloyd McKenzie (Nov 18 2019 at 21:55):

Most systems just grab the most recent values for each and display them or group them based on time proximity over time. The data transmission layers and storage layers pretty much all do one record per measurement and displays such as yours are pretty common. FHIR does allow 'grouping' Observations that can link a bunch of vitals (or other Observations) that were conducted more or less at the same time, so that could form a foundation for what you're trying to do - but the end result would be yet one more Observation type in the collection of Observations you bring back. (It would however make it easier to figure out what to group together - provided the source system does the grouping for you.)

view this post on Zulip dsh (Nov 18 2019 at 22:04):

@Lloyd McKenzie Thanks. Is there any reference that I can read up on "grouping" Observations? Also, are other resources groupable too?

Grouping by time proximity is a hit or a miss, depending on length of time. If the length of time is big then too many records are read. If length of time is too small than query has to be retried with a larger time interval. Seems like a common pattern to me, but is there any kind of utility built around this pattern.

view this post on Zulip Lloyd McKenzie (Nov 18 2019 at 22:06):

Observation has a hasMember relationship. A few other resources have an equivalent mechanism (e.g. Encounter.partOf). There's a brief discussion here: http://hl7.org/fhir/observation.html#obsgrouping

view this post on Zulip Michele Mottini (Nov 18 2019 at 22:50):

The observation you'd want to group would typically have the same effective[x] - so group by the corresponding column

view this post on Zulip Lloyd McKenzie (Nov 18 2019 at 22:59):

The challenge is that they often won't have exactly the same effective[x]. So some logic is needed to realize that if there were three done only a couple of minutes apart, you probably want either the latest or the average. Whereas if they were further apart, they might deserve a separate row.

view this post on Zulip dsh (Nov 19 2019 at 18:22):

@Lloyd McKenzie You captured it right but that's not the only challenge. For me (and I am sure for many) the challenge is what range of time to use to query the latest values e.g. last 1 or 2 hours or more ??? coz longer the patient's stay in hospital longer can be range for certain observations. As an example in Vital-Signs category body height and weight may not be observed more than once for total length of patient's stay, but heart rate / blood pressure may be observed every minute. So to fetch the latest Vitals comprising of attributes measured at different times (hours or days apart) e.g. body weight and say blood pressure .... there isn't a generic SQL query one can write.

view this post on Zulip dsh (Nov 19 2019 at 18:24):

Further complexity in SQL query arises when you want to limit the number of rows fetched, say the limit is 1 then (using the above use case) the Vitals record fetched will only have blood pressure and heart rate readings but not height and body weight readings.

view this post on Zulip Lloyd McKenzie (Nov 19 2019 at 18:28):

Take a look at the $lastn operation

view this post on Zulip Lloyd McKenzie (Nov 19 2019 at 18:28):

It let's you grab the most recent "N" measurements of all of the specified observation types

view this post on Zulip dsh (Nov 19 2019 at 18:35):

@Lloyd McKenzie Please correct me if I am wrong but Isn't $lastn operation FHIR server implementation dependent? plus I don't have FHIR server of any kind my application receives FHIR resources (from Mirth HL7v2.3 messages transformed to FHIR) and stores the original JSON blob plus stores all codes and values in relational database as below (table structure simplified here)

Code Value Created_At
8383-53 132 2019-11-18 17:58:45.240203
8112-21 90 2019-11-18 17:51:45.240203

So the question is really how to implement $lastn in absence of FHIR server in SQL queries.

view this post on Zulip Lloyd McKenzie (Nov 19 2019 at 18:38):

You may need separate searches for each code. Not sure if there's a better way to write the SQL.

view this post on Zulip dsh (Nov 19 2019 at 18:41):

@Lloyd McKenzie Ok. How does FHIR spec implementation guide suggest $lastn be implemented if not in SQL then what's the assumed data/storage model?

view this post on Zulip Lloyd McKenzie (Nov 19 2019 at 19:03):

How is up to the implementer. FHIR doesn't define what the persistence model is, nor the optimum way of using different query approaches.

view this post on Zulip dsh (Nov 19 2019 at 19:09):

Right. But this seems to be a very common problem and query pattern and I am wondering if an optimized solution like stored procedure or database built in function/extension already exsits

view this post on Zulip Lloyd McKenzie (Nov 19 2019 at 19:32):

@Grahame Grieve

view this post on Zulip Grahame Grieve (Nov 20 2019 at 04:52):

I don't know how to answer that because I implemented $lastN using an SQL server.

view this post on Zulip Kevin Mayfield (Nov 20 2019 at 07:11):

What I did was query of observations by codes and data range. The date range would depend on scenario. Results are sorted in date order.
So a client would take the first one.

Like Lloyd mentioned you can group observations by encounter. So if I had received the observations by hl7v2, they would all point to same encounter (one for each v2 message)

view this post on Zulip Kevin Mayfield (Nov 20 2019 at 07:15):

You can see the gist of this in this diagram https://project-wildfyre.github.io/conformance/graph/4


Last updated: Apr 12 2022 at 19:14 UTC