FHIR Chat · Export to Parquet · ibm

Stream: ibm

Topic: Export to Parquet


view this post on Zulip Lee Surprenant (Jul 30 2020 at 20:30):

I finally took a swag at packaging up the prototype code from @Gidon Gershinsky and @Eliot Salant for use from the FHIR Server: https://github.com/IBM/FHIR/pull/1374/files

view this post on Zulip Lee Surprenant (Jul 30 2020 at 20:31):

I just finished reviewing it with the team and we think this will work (though should be considered experimental at present).

view this post on Zulip Lee Surprenant (Jul 30 2020 at 20:32):

Unfortunately, this new power comes at a pretty heavy cost...by using Apache Spark to do the heavy lifting of the JSON -> Parquet writing, we introduce over 200MB of new dependencies to our already considerable size.

view this post on Zulip Lee Surprenant (Jul 30 2020 at 20:34):

It pushes the fhir-bulkimportexport-webapp over the size limit enforced by bintray, so we wouldn't even be able to publish it there.
So we explored an option of pushing that spark stuff into the install; that would allow us to continue posting the webapp to bintray, but we'd still pay to cost in terms of the size (and complexity) of our installer zip and docker image.

So with these considerations in mind we're thinking to do this:

Short term: mark these new dependencies as "provided" and introduce a config property called "enableParquet" which is false by default. then document that to turn it on, you need to add all the spark dependencies. when its false, reject requests for bulk export to parquet.

Long term: split fhir-bulkimportexport-webapp into a separate project with its own docker image. this way, the ibm-fhir-server can stay its current size (or get a bit smaller), and we can put all the new heft into the new image and call that ibmcom/ibm-fhir-job-server or some such

view this post on Zulip Lee Surprenant (Jul 30 2020 at 20:34):

Your feedback welcome on this plan...


Last updated: Apr 12 2022 at 19:14 UTC