Stream: ibm
Topic: Downloading bulk files
Murali M (Aug 02 2021 at 23:23):
Exporting bulk data with storage provider configured as "file". Files are successfully exported. However file paths included in response are relative paths. Is there a way those files can be exposed as fully qualified URLs and that can be downloaded by client applications? I do see S3 option as well that expose full URL of corresponding S3 objects (+ optionally presigned URLs). Thank you.
Lee Surprenant (Aug 03 2021 at 03:44):
However file paths included in response are relative paths. Is there a way those files can be exposed as fully qualified URLs and that can be downloaded by client applications?
Not at the moment. Our assumption with the file-based export is that the export location is known to the caller and they have some other way to access the files. I think it would be possible to statically host the target directory using Liberty or even just an Apache HTTP Server (or similar)...the tricky part is getting the access control and expiry right.
I do see S3 option as well that expose full URL of corresponding S3 objects (+ optionally presigned URLs).
Correct. Thats the way I would go to support the official bulk data spec. It also works with MinIO and other S3-compatible providers.
Murali M (Aug 03 2021 at 22:58):
Thank you @Lee Surprenant . We have some limitations with respect to using hmac with S3, so had to rely on file system instead. Will review further and update with a PR once finalized.
Lee Surprenant (Aug 04 2021 at 13:10):
limitations with respect to using hmac with S3
I'd love to hear more. We used to generate our own long random (high-entropy) URLs, make the objects public, and set the ACL to expire after a few hours. But we felt the presigned URL approach was much better.
Murali M (Aug 04 2021 at 22:21):
Pre-signed URLs are definitely better choice and we use in other use cases. This hmac limitation is with respect to security policy that blocks us from creating long lived access keys. Other option is to create pre-signed URLs with role that can be assumed by the container runtime instead. We are also evaluating this as an option.
Last updated: Apr 12 2022 at 19:14 UTC