Stream: analytics on FHIR
Topic: traditional relational model
Craig McClendon (Jun 01 2017 at 15:12):
Not sure if this is the proper forum, but has anyone done any work using a more traditional relational model, and something like a Postgres XL? http://www.postgres-xl.org/
Grahame Grieve (Jun 01 2017 at 20:45):
@nicola (RIO) uses Postgresql and is well connected to the dev team
Jim Steel (Jun 02 2017 at 00:12):
We use a mix of postgres and nosql in ontoserver
nicola (RIO/SS) (Jun 02 2017 at 09:09):
@craig mcclendon we use postgresql with jsonb - so it's like hybrid approach between document dbs and relational dbs. We initially tried to generate pure relational model - but it's about > 1K tables and joins are slow. Here is slides from my Madrid talk - https://niquola.github.io/madrid-hl7-2017-slides/slides/index.html#/2. PostgresXL is little bit outdated (as i remember they are on 9.4 version), but i know guys from postgrespro company who are working to update of PostgresXL to latest pg version. There are also greenplum and citusdb extensions - promising for pg analytic.
Gary Teichrow (Jun 02 2017 at 17:41):
Thx for link to PostGres-XL. I've been neglecting that project. I've also been tinkering with Apache Accumulo (https://accumulo.apache.org/) as primary backing store but still noodling about right 'granularity' for the data, if that makes sense. Cheers.
Boris Tyukin (Dec 02 2017 at 15:42):
Hi @nicola (RIO/SS) I am thinking to convert 10-15 different resources we are getting from vendor into relational database. Was 1K tables in your case covering all 100+ FHIR resources? The problem with storing FHIR data as JSON objects is that traditional BI tools cannot work with nested structures well and also we want to hide complexity of FHIR resources for our BI Developers.
nicola (RIO/SS) (Dec 02 2017 at 20:33):
Hi, usually in bi you write direct sql, so with pg this is not a problem. Another solution is to create flatten views in the same database
Boris Tyukin (Dec 02 2017 at 22:03):
@nicola (RIO/SS) right you can use SQL and dotted json notation and most RDBMS support json as an object these days (Oracle, MS SQL, Postgres) but it is still not a trivial task to do so because of nested arrays and references between documents. You had a nice slide about these challenges. I guess I am asking what everyone else are doing with FHIR documents if they need to analyze data and build dashboards using BI tools like QlikView, Tableau, Oracle BI etc. I was going to normalize all the nested structures (I believe I can generate DDLs and logic) but before I spend my time on it, wanted to see how everyone else is doing this
Grahame Grieve (Dec 03 2017 at 22:10):
no one has shared anything about this, though there are people doing it. But everyone has different styles of use of database (e.g. how much to normalise), and different use of underlying technologies, and different compromises due to additional constraints in the underlying data. As well as additional considerations around terminology infrastructure
Grahame Grieve (Dec 03 2017 at 22:10):
all of this means that there's not a lot of mileage in sharing DDL scripts.
Grahame Grieve (Dec 03 2017 at 22:11):
also, as you have found, non-transactional systems (for analysis/reporting) are starting to move to json based tooling and just using the resources directly.
Boris Tyukin (Dec 03 2017 at 23:31):
Thanks @Grahame Grieve makes sense to me. I think I am going to try a hybrid approach - store bundle resources as json values and expose some of the key data using normalized version for BI Developers. That way we have FHIR native data available for analytics and we have easier to use normalized tables for dashboards and such.
Boris Tyukin (Dec 04 2017 at 17:11):
found two interesting publications, focused on conversion of semi-structured json to relational model.
http://cs-www.cs.yale.edu/homes/dna/papers/schemagen-sigmod16.pdf
http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1781&context=etd
Boris Tyukin (Dec 09 2017 at 17:39):
someone pointed me to an excellent write-up about EAV modeling concepts on wiki https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model. It is a long article but it explains different options on how to deal with columnar representation of EAV data for statistical / analytical tools.
Christiaan Knaap (Dec 13 2017 at 12:14):
Another approach I've been thinking of is more solution bound: Vonk on SQL Server indexes all the search parameter values into a couple of simple tables that should be a fairly good source for star schemas. It can also process custom search parameters, with custom fhirpath expressions in them - so you can get those in the index as well.
Craig McClendon (Dec 18 2017 at 19:55):
Here was a simple example I created just to get a sense of what it might look like to model out the resources and datatypes and what the queries might look like.
It's using a generic table "relationship_mapper" to manage the relationships between objects.
In this case I was just looking at how to represent and query Observation.code.
(this is psql syntax)
CREATE TABLE Observation (
pkey text,
_status text
);
CREATE TABLE CodeableConcept(
pkey text,
_text text
);
CREATE Table Coding (
pkey text,
_system text,
_version text,
_code text,
_display text,
_userSelected boolean
);
-- generic relationship mapping table
CREATE TABLE relationship_mapper (
pkey text,
table_a text,
table_b text,
table_a_key text,
table_b_key text,
relationship_type text
);
-- insert an observation with a code
INSERT INTO Observation (pkey) VALUES ('OBS1');
INSERT INTO CodeableConcept(pkey) VALUES ('CC1');
INSERT INTO Coding(pkey, _system, _code) VALUES ('CODING1', 'http://loinc.org', '8867-4');
INSERT INTO relationship_mapper VALUES ('R1', 'Observation', 'CodeableConcept', 'OBS1', 'CC1', 'code');
INSERT INTO relationship_mapper VALUES ('R2', 'CodeableConcept', 'Coding', 'CC1', 'CODING1', 'coding');
drop view ObservationCode;
CREATE VIEW ObservationCode as
SELECT r1.table_a_key as ObservationKey, tab.*
FROM CodeableConcept tab
INNER JOIN relationship_mapper r1 ON (
r1.table_a = 'Observation'
AND r1.relationship_type = 'code'
AND r1.table_b = 'CodeableConcept'
AND r1.table_b_key = tab.pkey);
drop view CodeableConcept2Coding;
CREATE VIEW CodeableConcept2Coding as
SELECT r1.table_a_key as CodeableConceptPkey, tab.*
FROM Coding tab
INNER JOIN relationship_mapper r1 ON (
r1.table_a = 'CodeableConcept'
AND r1.relationship_type = 'coding'
AND r1.table_b = 'Coding'
AND r1.table_b_key = tab.pkey);
--SELECT * FROM ObservationCode
--SELECT * FROM CodeableConcept2Coding
--SELECT * FROM relationship_mapper
-- find all observations with a particular code
SELECT *
FROM Observation obs
WHERE EXISTS (
SELECT *
FROM ObservationCode cc
INNER JOIN CodeableConcept2Coding coding ON (cc.pkey = coding.CodeableConceptPkey)
WHERE cc.ObservationKey = obs.pkey
AND coding._system = 'http://loinc.org'
AND coding._code = '8867-4' )
Last updated: Apr 12 2022 at 19:14 UTC