FHIR Chat · Graph Based Approach

Stream: analytics on FHIR

Topic: Graph Based Approach

Fahim Shariar (Apr 12 2017 at 17:52):

I believe analytics on FHIR Resource is possible through Graph Database

Fahim Shariar (Apr 12 2017 at 17:53):

I am using Neo4j

Fahim Shariar (Apr 12 2017 at 17:53):

I have drawn a sample image that you help you to realize what i am trying to achieve

Fahim Shariar (Apr 12 2017 at 17:54):

graph.jpg

Fahim Shariar (Apr 12 2017 at 17:55):

Have anyone tried yet ?

Fahim Shariar (Apr 12 2017 at 18:31):

@Andrew Brown yea exactly but not just one it will be a big connected graph . Multiple node may come from a single FHIR resource. It epends on the Resource

Andrew Brown (Apr 12 2017 at 18:33):

I am in the process of creating an HML (histoimmunogenetics markup language) to FHIR service that I ran into some of the issues you're facing. The service has a variety of modeling through Java objects that you may find useful

Andrew Brown (Apr 12 2017 at 18:34):

https://github.com/nmdp-bioinformatics/service-hml-fhir-converter

David Hay (Apr 12 2017 at 18:34):

I've not tried it, but often thought it a good fit. I do use a graph layout in clinFHIR (and they can get quite complex - see here: https://fhirblog.com/2017/03/28/a-complex-scenario/ but this visualization uses a javascript library - visjs - and it is just a visualization...

David Hay (Apr 12 2017 at 18:35):

are you just looking to display or to analyse? What sort of analysis if so?

Fahim Shariar (Apr 12 2017 at 18:42):

@Andrew Brown nice to head that . I will definitely check it out . However despite the tools everybody use I think the first thing to consider here is how efficiently you can convert a FHIR resource to a Graph .
For example Lets say AllergyIntolerance . It has multiple properties and from my point of view multiple node could be possible from one singe resource. For AllergyIntolerance the possible node could be Allergy , Code , Category , Substance , Manifestation , Patient , Provider etc .

Fahim Shariar (Apr 12 2017 at 18:44):

Look at the picture above i have drawn multiple relations from Observation and how it is connected to other resource including reference and patient and code and the rest .
Any suggestion on that ?

Fahim Shariar (Apr 12 2017 at 18:50):

@David Hay the complex scenario visualize how each resource is connected to each other which is fine but I am taking about even more deep categorization of vertex from each resource . Like I said multiple node could be possible form one resource and if you look at the picture above I have 4 FHIR resource but 17 nodes form it .

Sadiq Saleh (Apr 12 2017 at 18:52):

I think one option you may want to think about is to leverage the current UML structures proposed by FHIR as a basis for the graph. It may help to determine what data would be associated with a node, as some should attributes should be represented by a separate node linked by an edge. Have you looked at these?

David Hay (Apr 12 2017 at 18:54):

in effect the backbone elements would also define a separate node...

Sadiq Saleh (Apr 12 2017 at 18:59):

A follow up question would be what would you expect to be stored on a node, would it be a single attribute? And if so then I guess there would need to be some way of grouping nodes related to a resource and differentiating these edges from an edge used as a reference between two different resources

Fahim Shariar (Apr 12 2017 at 19:02):

@David Hay exactly , It should and I have done that .
@Sadiq Saleh i have looked the UML Structure It shows just one relation . For example the AllergyIntolerance has relation with Reaction . If I follow this structure then there will be two node . But if you look at the Terminology bindings , the Path column . I think all path should become a separate node and when two AllergyIntoleracne will share the same code/coding then just a relation will be created towards this node

Sadiq Saleh (Apr 12 2017 at 19:11):

If each attribute (represented by different paths) is represented by different nodes, how would you group attributes under a single resource together? If you will be using a "parent" node, currently there is no way to distinguish an edge from AllergyIntolerance(parent) -> AllergyIntolerance.code and an edge between AllergyIntolerance(parent) -> Patient(parent)

Fahim Shariar (Apr 12 2017 at 19:25):

allergy_graph.jpg
This is what i am taking about .. There will be more connection when you will find another resource with same code value

Fahim Shariar (Apr 12 2017 at 19:30):

@Sadiq Saleh you are right , I think we have to analyse the resource , mainly the data within it and think what node we can create from it and if we create will it be helpful ?

Fahim Shariar (Apr 12 2017 at 19:53):

I think invariant is one of many property of graph :)

Abbie Watson (Apr 12 2017 at 21:08):

We've run FHIR/JSON objects through various D3 graphs with good success. Not sure that an entire graph database is needed; as most any object store should suffice. That being said, we've been looking at pulling in the Neo4j libraries to do more complex graphs.

Howard Edidin (Apr 12 2017 at 21:09):

What is the Use Case for doing graphs?

Howard Edidin (Apr 12 2017 at 21:10):

We are using Machine Learning on Azure to handle predictive medicine; i.e. outcomes for cancer

Abbie Watson (Apr 12 2017 at 21:11):

Use Cases include Fraud Detection, Search, Recommendations, Social Network, Identity Management, Network Operations.

As per Neo4j
https://neo4j.com/

Fahim Shariar (Apr 12 2017 at 21:53):

@Howard Edidin well the use case and query could be vast if we can manage to make use of all the fields in a FHIR resource into a proper connected graph .
Here is some sample graph gist from Neo4j . Check the Behavioural treatment . I believe a proper graph could also be used in Machine Learning .
https://neo4j.com/graphgists/?category=health-care-and-science
use-cases-graph-in-healthcare.pdf
the data they used , does not follow any standard and unstructured . If we can properly mange to create a complete connected graph among the connected FHIR resource then we can do lot more .. dont you think ?

Russell Leftwich (Apr 12 2017 at 22:25):

Unrelated to the main issues discussed, but the SCT code used in the graph for cashew nut is the code for the substance (solution) used to test an individual to see if they have an allergic reaction to cashew nut. It is not the SCT code for cashew nut, the edible substance.

Howard Edidin (Apr 12 2017 at 22:34):

I am not sure how you would use a Graph with Machine learning
https://azure.microsoft.com/en-us/services/machine-learning/

Michael Lawley (Apr 12 2017 at 23:33):

@Fahim Shariar Since graphs are triples (end edge connects two nodes), and RDF is triple-based, have you considered just visualising the RDF representation?

Geoff Low (Apr 13 2017 at 07:40):

yeah, that was going to be my suggestion. Have you spoken with @David Booth ?

Muhammad Abubakar Ikram (Apr 13 2017 at 09:44):

What can we achieve by making these type of graphs on fhir data?

Fahim Shariar (Apr 13 2017 at 10:04):

@Howard Edidin Well, Graph is actively used by google and Insight Data Eng. Team for Machine Learning You can read more about it here
https://blog.insightdatascience.com/graph-based-machine-learning-6e2bd8926a0
https://research.googleblog.com/2016/10/graph-powered-machine-learning-at-google.html
I hope you will find it interesting image02.png

Fahim Shariar (Apr 13 2017 at 10:27):

@Russell Leftwich Thank you for pointing this out and you just proved my point there . I have drawn the graph based on the description and my understanding of the AllergyIntolernce fields from detailed description I found on FHIR and the connection between them is not perfect and may be unusual . That's why I need help to create meaningful relations between those nodes/data that could be formed from a single FHIR Resource .
And also can you redraw the Graph with correct relations and with any other nodes that you see fit and missing within the AllergyIntolerance fields

Howard Edidin (Apr 13 2017 at 16:30):

It appears to me that a graph is a visual representation to be evaluated by a person. How would you use the results from a Graph in an application or service that has no UI?

Abbie Watson (Apr 14 2017 at 00:32):

Do you have an RDF sample dataset yet? Can Neo4j serve up a JSON object? Could that JSON object be mapped/transformed into something like the miserables.json dataset from the D3 directed graph example?

https://bl.ocks.org/mbostock/4062045

Fahim Shariar (Apr 14 2017 at 08:51):

@Abigail Watson Well I have RDF sample dataset and trying to visualize the RDF files but could not find a visualizer for FHIR RDF resource .
As far as your questions , Neo4j can serve up as a JSON object it has powerful CYPHER Graph Query Language and RESTApi . But It is not suitable for storing JSON or Nosql data .
You have to do Polygot Persistence . Store FHIR Resource in Nosql Database (Mongo, Couchbase , etc ) and decompose the json and store the probable vertex and node in Neo4j for Analytics and Complex Queries . Thats what we are trying to do ...
And you can export Neo4j data to a miserables.json then use it in D3 graph but Neo4j has a build in Graph Visualizer that will show your result in Graph after Query just like D3 , even better I think

Abbie Watson (Apr 14 2017 at 15:56):

Sounds like all the pieces are in place (we're looking at a similar Neo4j/Mongo/D3 solution).

To your question about vertexes/edges, have you taken a look at Schematron? I think there's support for RDF (although loops and cycles probably aren't supported). I'm not sure it's exactly what you're looking for; but it's vaguely in the same ballpark; recognized by w3.org and other standards bodies, and might generate some leads. Seems like you're asking for a Graph Schematron.

https://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/pp/validating_rdf/

Howard Edidin (Apr 14 2017 at 20:24):

Have you looked at Azure DocumentDB?
https://azure.microsoft.com/en-us/services/documentdb/

Fahim Shariar (Apr 15 2017 at 18:38):

Not in details but I have seen their specification . I am using Couchbase for document store . Is there any kind of analytical features you want me to check there ?

Fahim Shariar (Apr 17 2017 at 21:05):

I have searched and found many papers but the resource they use to describe the analytics scenario is not FHIR related . They use their own data and formats .
Thats why i opened this group to have a discussion about the use cases along with the Graph base approach to FHIR Analytics . I am sure there are many analytics prospective there

Fahim Shariar (Apr 17 2017 at 21:08):

Also seems like nobody implemented the approach I am proposing to analyse FHIR resource with Graph Database , even if they did its proprietary. I am on my way to make a draft and soon I will try to show a demo with use cases and queries .
All will be included in the draft .

David Booth (Apr 18 2017 at 20:38):

Hi Fahim, and welcome! I am glad to hear of your work. I will be very interested to see your demo. Please let me know when it is ready to show. I would also suggest that you look at the work done by @Harold Solbrig from Mayo Clinic. He made a demo of it here: https://github.com/BD2KOnFHIR/BLENDINGFHIRandRDF Please let us know if you try it, and please provide any feedback or suggestions. Thanks!

Fahim Shariar (Apr 19 2017 at 09:37):

@David Booth I will definitely check out the demo . It seems interesting . And I am trying to make a draft ASAP before the demo .

Fahim Shariar (Apr 27 2017 at 11:00):

I am taking some different approach . Right now approaching a polyglot persistence model. The FHIR resource is saved in a NOSQL Database that satisfies search terms of FHIR and During the insert its also automatically converted into a connected graph or become a part of existing graph network if similar code/valuset or entity is found .
My work is still under progress . Now the challenge is to Graph model the healthcare data specially for each entities in FHIR Resource and if each field in a FHIR resource capable of being node or a or a property of a property graph
I will post my research soon ...

Fahim Shariar (May 01 2017 at 21:11):

@Chris Grenz "analytics" comes with significantly more complex queries. I do agree with you on this point . I am trying to do complex queriess with graph but the Resource and data structure I choose to use is in FHIR thats why i am here .
When I first started working on this project my supervisor asked what would be an example of a complex query that you are taking about ? I told him something similar like below,
"Find all cancer treatment drugs available since 2012 which have proved most effective in treating stage two ovarian cancer in women under the age of 30"
I am trying to solve something like this with Property Graph Database . Will Apache Drill's SQL Query be able to do such thing ? I am curious to know . I might look into other options if i fail to do this with Graph @natus and @Chris Grenz ?

Grahame Grieve (May 01 2017 at 21:18):

you will need terminology service support for a query like that. And very well coded data.

natus (May 01 2017 at 22:21):

Find all cancer treatment drugs available since 2012 which have proved most effective in treating stage two ovarian cancer in women under the age of 30

This definitively looks doable with SQL tools. Look at http://build.fhir.org/terminology-service.html#closure-use . The pseudo-SQL really looks like such use case. Your query is no more than a set of logical predicate implementable as SQL, with subqueries

Grahame Grieve (May 02 2017 at 00:58):

once you've build the closure table, yes

Josh Mandel (May 02 2017 at 03:11):

Well, even that's readily expressed using WITH RECURSIVE if you don't keep a predefined closure table.

natus (May 02 2017 at 03:25):

Can you both elaborate a bit ? Not sure closure table is needed at all here. Manually listing the code within the SQL would just work. The link was supposed to enlight the SQL power for analytics, not the closure table.

AFAIK, purpose of the closure table is only to help in exploiting code mapping.

Josh Mandel (May 02 2017 at 03:45):

Yes, I think the question is whether you're looking for specific codes vs "codes or any of their descendents in a hierarchical terminology system". If it's just specific codes, it's even simpler.

Abbie Watson (May 02 2017 at 03:53):

SQL tools are nice, but they can't handle a query such as 'Find the shortest route between Locations that are members of HealthcareService and have Resources belonging to this CarePlan'. Neo4j supports Djikstra's algorithm out of the box, and that goes a long way. There are some things that a graph database can simply do better than SQL.

natus (May 02 2017 at 08:14):

There are some things that a graph database can simply do better than SQL.

Very true and symetric assertion !

There are some things that SQL can simply do better than a graph database.

I would say that @Fahim Shariar ' use case is SQL friendly. Anyway, I am working on a SQL drill implementation, so that he can benefit from my work if graph database would be to much difficult.

Fahim Shariar (Aug 22 2017 at 08:55):

@Martin Maiers , Thank your for your comments , It inspires hearing you find it interesting . I have recently Open Sourced the library called Jypher ( https://github.com/kite-social/jypher/tree/fhir ) . It's still in the initial stage. I am trying to make this library universal but it's inspired by FHIR Resource structure. Check out the fhir branch on github in the link above , You will find a sample Patient Resource and the graph generated in Neo4j .

Also I am a professional Golang Developer , also its a part of a system developed in Golang . I like python but prefer to use golang when i am building library, system and stuff ...

Craig McClendon (Jan 04 2018 at 21:22):

I'm new to graph DBs, but am also exploring representing FHIR as a connected property graph.

My first thought would be to create a vertex for each Resource, each complex datatype, each BackboneElement, and in some cases for even the simple data types.
The attribute names then become either edge labels or vertex properties.
For example the CodeableConcept vertex has a 'text' property, and an outbound edge labeled 'coding'. The Coding vertex contains properties for 'code', 'display', 'system', etc.

Any thoughts on this approach?

FYI: @Fahim Shariar

Here is a partial model of Observation illustrating this:
obs-graph-1.png

Grahame Grieve (Jan 04 2018 at 21:29):

we often have people ask for this, and in the very early days, I used to maintain it manually. But it rapidly came to have too much information to maintain, and then to even understand. ClinFHIR has a single resource centric viewer

Craig McClendon (Jan 04 2018 at 21:33):

Thanks Grahame. For clarification, I'm not looking to create a graphical representation of the FHIR model. I am looking at how to model/persist FHIR data in graph database for search and analytics uses.

Grahame Grieve (Jan 04 2018 at 21:33):

oh. sorry

Grahame Grieve (Jan 04 2018 at 21:34):

what is your leaf structure? primitive types? json?

Craig McClendon (Jan 04 2018 at 21:45):

I think you're asking where the actual data is stored. There are just vertices and edges in a connected property graph. Both are "labeled", and both can contain properties. The picture I posted shows the labels for everything, but not the properties defined for the vertices. The Coding vertex for example has properties for code, display, system, etc. which would contain the actual data. Each property has a datatype. Hope that makes sense.

Grahame Grieve (Jan 04 2018 at 21:50):

primitives then - so you need a vertices for everything that can repeat, or that has sub-properties. And you need to figure out what to do about extensions on primitives

Craig McClendon (Jan 04 2018 at 21:54):

Essentially yes. Some graph DBs allow for multiple cardinality properties. So primitives that repeat can be defined still as a property of a vertex. But any values that need to be grouped..

Craig McClendon (Jan 04 2018 at 22:00):

..generically handling extensions could be tricky beyond just storing the blob of the extension text. Will have to think on that one a bit

Christiaan Knaap (Jan 08 2018 at 15:01):

You could also use the search parameters as a starting point. Use the reference search parameters as vertices, and the rest as leafs, plus one leaf carrying the original resource as a string / blob. The values of the search parameters are easier to express as a set of name-value pairs (instead of the hierarchical data in a resource). Plus the're the only things defined to search on in FHIR.

Last updated: Apr 12 2022 at 19:14 UTC

Main menu

FHIR Chat · Graph Based Approach · analytics on FHIR

Stream: analytics on FHIR

Topic: Graph Based Approach

Fahim Shariar (Apr 12 2017 at 17:52):

Fahim Shariar (Apr 12 2017 at 17:53):

Fahim Shariar (Apr 12 2017 at 17:53):

Fahim Shariar (Apr 12 2017 at 17:54):

Fahim Shariar (Apr 12 2017 at 17:55):

Fahim Shariar (Apr 12 2017 at 18:31):

Andrew Brown (Apr 12 2017 at 18:33):

Andrew Brown (Apr 12 2017 at 18:34):

David Hay (Apr 12 2017 at 18:34):

David Hay (Apr 12 2017 at 18:35):

Fahim Shariar (Apr 12 2017 at 18:42):

Fahim Shariar (Apr 12 2017 at 18:44):

Fahim Shariar (Apr 12 2017 at 18:50):

Sadiq Saleh (Apr 12 2017 at 18:52):

David Hay (Apr 12 2017 at 18:54):

Sadiq Saleh (Apr 12 2017 at 18:59):

Fahim Shariar (Apr 12 2017 at 19:02):

Sadiq Saleh (Apr 12 2017 at 19:11):

Fahim Shariar (Apr 12 2017 at 19:25):

Fahim Shariar (Apr 12 2017 at 19:30):

Fahim Shariar (Apr 12 2017 at 19:53):

Abbie Watson (Apr 12 2017 at 21:08):

Howard Edidin (Apr 12 2017 at 21:09):

Howard Edidin (Apr 12 2017 at 21:10):

Abbie Watson (Apr 12 2017 at 21:11):

Fahim Shariar (Apr 12 2017 at 21:53):

Russell Leftwich (Apr 12 2017 at 22:25):

Howard Edidin (Apr 12 2017 at 22:34):

Michael Lawley (Apr 12 2017 at 23:33):

Geoff Low (Apr 13 2017 at 07:40):

Muhammad Abubakar Ikram (Apr 13 2017 at 09:44):

Fahim Shariar (Apr 13 2017 at 10:04):

Fahim Shariar (Apr 13 2017 at 10:27):

Howard Edidin (Apr 13 2017 at 16:30):

Abbie Watson (Apr 14 2017 at 00:32):

Fahim Shariar (Apr 14 2017 at 08:51):

Abbie Watson (Apr 14 2017 at 15:56):

Howard Edidin (Apr 14 2017 at 20:24):

Fahim Shariar (Apr 15 2017 at 18:38):

Fahim Shariar (Apr 17 2017 at 21:05):

Fahim Shariar (Apr 17 2017 at 21:08):

David Booth (Apr 18 2017 at 20:38):

Fahim Shariar (Apr 19 2017 at 09:37):

Fahim Shariar (Apr 27 2017 at 11:00):

Fahim Shariar (May 01 2017 at 21:11):

Grahame Grieve (May 01 2017 at 21:18):

natus (May 01 2017 at 22:21):

Grahame Grieve (May 02 2017 at 00:58):

Josh Mandel (May 02 2017 at 03:11):

natus (May 02 2017 at 03:25):

Josh Mandel (May 02 2017 at 03:45):

Abbie Watson (May 02 2017 at 03:53):

natus (May 02 2017 at 08:14):

Fahim Shariar (Aug 22 2017 at 08:55):

Craig McClendon (Jan 04 2018 at 21:22):

Grahame Grieve (Jan 04 2018 at 21:29):

Craig McClendon (Jan 04 2018 at 21:33):

Grahame Grieve (Jan 04 2018 at 21:33):

Grahame Grieve (Jan 04 2018 at 21:34):

Craig McClendon (Jan 04 2018 at 21:45):

Grahame Grieve (Jan 04 2018 at 21:50):

Craig McClendon (Jan 04 2018 at 21:54):

Craig McClendon (Jan 04 2018 at 22:00):

Christiaan Knaap (Jan 08 2018 at 15:01):