FHIR Chat · Gzip compression on Cerner staging server · bulk data

Stream: bulk data

Topic: Gzip compression on Cerner staging server


view this post on Zulip Mikhail Lapshin (Sep 11 2018 at 16:19):

@Jenni Syed , do you consider to enable gzip http compression on Cerner staging Bulk API server? It will drastically decrease amounts of data being downloaded, from hundreds of megabytes to just tens.

view this post on Zulip Dennis Patterson (Sep 11 2018 at 19:28):

The current testing implementation is returning static links to direct downloads from S3. We could put those links behind CloudFront and enable gzip compression. I think it's also possible to just upload gzipped data but that'd limit access to uncompressed data for anybody who'd want it

view this post on Zulip nicola (RIO/SS) (Sep 11 2018 at 19:30):

I think most of HTTP clients (including browsers) understand gzip :)

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:34):

part of this limitation is just because this is our beta implementation - not production ready. The gzip and several other http level "large file handling" considerations would be beneficial to add to considerations in the spec

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:34):

eg: streaming :)

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:34):

this sounds like a good topic for the connectathon :)

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:36):

I know we've talked about chunking as well in the past, which is something we support on some of our other APIs for large file transfers (each chunk would then be gzip as well)

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:44):

Also, I know you mentioned megabytes because that's likely the amount of data we have here. What we've seen in other settings/more realistic production settings gets into the gig size ranges

view this post on Zulip Jenni Syed (Sep 11 2018 at 19:45):

(with other bulk APIs we have that do similar types of operations)

view this post on Zulip Mikhail Lapshin (Sep 12 2018 at 10:35):

The current testing implementation is returning static links to direct downloads from S3. We could put those links behind CloudFront and enable gzip compression. I think it's also possible to just upload gzipped data but that'd limit access to uncompressed data for anybody who'd want it

S3 can properly serve pre-gzipped files, you just need to set 'Content Type' and 'Content Encoding' properties on S3 files as described in this post: https://medium.com/@graysonhicks/how-to-serve-gzipped-js-and-css-from-aws-s3-211b1e86d1cd

It took me almost a day to download all 5GBs of this dataset, that's why I'm so concerned :)

view this post on Zulip John Moehrke (Sep 12 2018 at 14:51):

is there use of http/2 which includes multi-threading and automatic compression?

view this post on Zulip Jenni Syed (Sep 12 2018 at 15:32):

We haven't talked about needing that yet in the spec (much like GZip, streaming, etc) and good discussion to have. I will say that HTTP 2 support is still a bit spotty. I think under 30% support it as of the last stat I heard? So it may not be the 100% win here.

view this post on Zulip John Moehrke (Sep 12 2018 at 15:44):

understood, just wanting it on the stack of things to consider... especially when the group is considering gzip.

view this post on Zulip Dennis Patterson (Sep 13 2018 at 14:02):

Set up a CloudFront instance in front of our S3 bucket to auto-compress. Turns out they'll only do this for files smaller than 10MB, so not gonna help :). We'll have to look at uploading pre-compressed contents

view this post on Zulip Josh Mandel (Sep 13 2018 at 15:05):

Interesting -- and that would be an API change (i.e., it changes what's returned). I was assuming Accept-Encoding: gzip would get us where we needed to be, but evidently not with S3 http hosting?

view this post on Zulip Dennis Patterson (Sep 13 2018 at 15:10):

Noting that this is all with pre-generated, mock data... Per @Mikhail Lapshin 's comments above, I think it'd be uploading the gzipped data, tell S3 to return Content-Encoding: gzip, and then when we return the list of files, they'd be able to be retrieved when requesting Accept-Encoding: gzip. I think AWS' approach is more elaborate if you want to return various compressions (i.e. store them all pre-compressed in S3 and use Lambda@Edge to vary what gets returned according to Accept-Encoding...blah)

view this post on Zulip Josh Mandel (Sep 13 2018 at 15:22):

In this scenario I think things would fail in the absence of Accept-Encoding: gzip-- because a client would get gzipped content regardless of what they requested.

view this post on Zulip Dennis Patterson (Sep 13 2018 at 15:26):

Right, unless we did the work to support both, that's correct. This would be a connectathon-only limitation for our server, but from the very presence of this thread, I'm guessing that's what most clients want :)

view this post on Zulip Josh Mandel (Sep 13 2018 at 15:27):

It's definitely what most clients should want :-)


Last updated: Apr 12 2022 at 19:14 UTC