FHIR Chat · Montreal Connectathon · bulk data

Stream: bulk data

Topic: Montreal Connectathon


view this post on Zulip Brian Wright (Mar 26 2019 at 17:37):

I see that there is a single track in Montreal connectathon covering both bulk data and analytics.

https://confluence.hl7.org/display/FHIR/2019-05+Bulk+Data+and+Analytics+Track

Looking at the track scenarios, there were a couple of items from the San Antonio Storage and Analytics track that we wanted to target for Montreal
1) Interoperability testing between different "SQL on FHIR" implementations.
2) Running analytics on the extracted data.

Thoughts about including those items in scope for Montreal?

view this post on Zulip Toby Hu (Mar 26 2019 at 17:52):

To echo Brian's question, we discussed in San Antonio about having a standard dataset and set of query ideas shared by all participants for analytics and compare. Do we want to use the bulk data dataset (https://github.com/smart-on-fhir/flat-fhir-files) for the analytics testing, or should we also include e.g. SyntheticMass data (https://syntheticmass.mitre.org/download.html). Are we planning to have some pre-connectathon meeting calls to discuss these?

view this post on Zulip Josh Mandel (Mar 26 2019 at 17:56):

Definitely would like to cover interop testing for SQL on FHIR. The proposal we put together was really just to make sure we had something in place by the track submission deadline. Please feel welcome to add or modify scenarios on the Confluence site and ping here. We'll play to schedule a prep session in mid April.

view this post on Zulip Josh Mandel (Mar 26 2019 at 17:58):

@Toby Hu for sample data, I think it's useful to have a small data set and a larger set of files, all from the same source. Synthetic MA in FHIR r4 would be a fine way to go; would you be willing to prepare files in the ndjson bulk data format and share in a public cloud storage bucket? Like one directory of small data, O(1-10Mb) and at least one directory of larger data?

view this post on Zulip Toby Hu (Mar 26 2019 at 18:27):

@Josh Mandel that sounds good to me, but let me confirm with my team first before committing to this, and get back to you before the prep call.

view this post on Zulip Toby Hu (Mar 26 2019 at 18:28):

For SyntheticMass though, AFAIK they only have DSTU2 and STU3 on their website.

view this post on Zulip Josh Mandel (Mar 26 2019 at 18:31):

Cool! Part of the work might be generating new data; I haven't looked to see whether the generator works with r4 yet, but maybe @Jason Walonoski can give us a quick hint.

view this post on Zulip Jason Walonoski (Mar 26 2019 at 20:31):

@Josh Mandel Synthea does generate R4, but not by default. To generate R4 bulk data change src/main/resources/synthea.properties so that exporter.fhir_r4.export = true and exporter.fhir.bulk_data = true and then ./run_synthea -p 100 to generate 100 patients (or however many you want)


Last updated: Apr 12 2022 at 19:14 UTC