FHIR Chat · Binary resources · bulk data

Stream: bulk data

Topic: Binary resources


view this post on Zulip Michele Mottini (Oct 29 2018 at 19:06):

Should the protocol support binary resources? If yes, how?

view this post on Zulip Michele Mottini (Oct 29 2018 at 19:06):

(see also https://github.com/smart-on-fhir/fhir-bulk-data-docs/issues/84)

view this post on Zulip Josh Mandel (Oct 29 2018 at 19:39):

Thanks -- I just commented in GH; I think this is an important design question and we should tackle it. (I'll also note that I wouldn't hold off on a 1.0 publication for this.)

view this post on Zulip Grahame Grieve (Nov 01 2018 at 21:56):

it has to

view this post on Zulip Josh Mandel (Nov 01 2018 at 22:17):

Can you elaborate @Grahame Grieve ? (Beyond "treat Binary just like all other referenced resources" are you suggesting something more is essential?) Cc @Dan Gottlieb

view this post on Zulip Grahame Grieve (Nov 02 2018 at 01:13):

I don't think they're anything special

view this post on Zulip Josh Mandel (Nov 02 2018 at 01:14):

Okay, cool. I think that was option number one that I listed on the GH issue.

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:26):

mhh.. if they should be handled like the other resources that would be option two, wouldn't it? 'Explicit export a set of Binary files....'

view this post on Zulip Josh Mandel (Nov 02 2018 at 02:27):

D'oh yes, that's what I meant. I forgot the order I listed things in :-)

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:33):

they are not exactly as other resources: they are referenced by Attachment URLs instead of Reference elements

view this post on Zulip Lloyd McKenzie (Nov 02 2018 at 02:42):

Couldn't they be referenced by Reference elements too?

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:43):

Don't know - never saw that

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:44):

Specs say 'This resource is generally used as the target of a Document Reference or an Attachment, '

view this post on Zulip Lloyd McKenzie (Nov 02 2018 at 02:47):

Yes, but it's allowed to be pointed to directly - any place that's allowed to reference Any resource is allowed to reference Binary.

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:48):

OK - but does that happen anywhere?

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:48):

In any case - my point is not really important

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:49):

I'll do some testing of using transferring binary resources using their JSON format

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:50):

(converting to base64, gzip, un-gzip, converting back to binary)

view this post on Zulip Michele Mottini (Nov 02 2018 at 02:50):

to see what's the impact

view this post on Zulip Lloyd McKenzie (Nov 02 2018 at 03:56):

My point was that it's best not to treat them specially in terms of how they can be referenced because they're not defined that way.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:17):

The worry I would have with Binary is that they are not self describing as to their Privacy or Security needs. Yes they can have security labels, which is my recommendation. However from a OAuth Scope perspective we don't yet have the ability to define in a scope the security labels authorized... right? Thus Binary should be treated special. One needs to find the 'other' Resource that describes the security context. This might be by way of the Binary.securityContext, or might be a Resource that points at the Binary, hopefully something like DocumentReference. Once you have that 'other' resource you would have a .subject and possibly other context. You then treat the Binary in the same way you would treat that 'other' resource.

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:19):

This all makes sense but none of it is specific to bulk data, right @John Moehrke ?

view this post on Zulip John Moehrke (Nov 02 2018 at 13:20):

correct. if the question is if it is different in bulk vs not... I would agree, not... but I am not then clear what is different in bulk vs not?

view this post on Zulip John Moehrke (Nov 02 2018 at 13:21):

I was addressing the question of current SMART-on-FHIR ability to control various data by way of scopes.

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:31):

What's different in bulk data is that you expect the server to follow relevant references and that's the data from the other side to prepare as part of your export.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:33):

so the question is on following References and attachment.url? Following Reference seems to be what has been answered, the fact that the Reference is a Binary is immaterial to if you _include... But most Binary are referenced by way of attachment.url, and not all attachment.url are Binary. Thus it is not obvious if an attachment.url is a Binary that could be treated as a Binary Resource, or is not a Binary and thus unclear what to do...

view this post on Zulip John Moehrke (Nov 02 2018 at 13:35):

Thus attachment.url is sometimes a Reference and sometimes simply a url. Sometimes that matters to the server, most of the time it doesn't matter to the client (which just uses http negotiation).

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:35):

It's about which references to follow (ultimately: server discretion) and whether/how to deliver the results.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:36):

in the way I just outlined, or other?

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:37):

I'm not sure what you meant there; we're not doing anything specifically with _include

view this post on Zulip John Moehrke (Nov 02 2018 at 13:37):

or is it the fact that Binary are often big bulky, hard to process as they are 99% of the time NOT FHIR encoded,

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:38):

Yeah, more like that.

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:38):

But as a first pass, I'm happy enough saying "yup, take the hit when exporting binary for bulk data". For now.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:39):

likely the best first pass. the alternative is the complete opposite to say Binary are NEVER communicated.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:40):

pipe through a de-identification service would likely redact them totally anyway... :-)

view this post on Zulip John Moehrke (Nov 02 2018 at 13:41):

(I used _include as a means of human communication of the concept. Sorry)

view this post on Zulip Josh Mandel (Nov 02 2018 at 13:42):

(Incidentally, I'm looking forward to a world with more useful deID trade-off options.)

view this post on Zulip John Moehrke (Nov 02 2018 at 13:44):

don't count those chickens... each use is unique, very few re-usable algorithms. However I am hopeful that algorithms can be defined in rules, that are injected into a de-identification service.

view this post on Zulip John Moehrke (Nov 02 2018 at 13:45):

the two re-usable algorithm that I know of is pipe thru /dev/null, or /dev/random

view this post on Zulip John Moehrke (Nov 02 2018 at 13:46):

(applause) (laughing) --- Thank You, Im here all weekend...

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:37):

I got 180 reports and tried to convert to a single NDJSON file of Binary resources, gzip it, un-gzip it, extract back the original files

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:38):

and then did the same just concatenating the raw files instead of converting them to json Binary resources

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:40):

the resulting gzip is 40% smaller, overall processing time is 10% faster of using ndjson

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:42):

I _think_ we are OK sticking with ndjson

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:42):

if we are not overly concerned with download size

view this post on Zulip Michele Mottini (Nov 02 2018 at 16:43):

the reasonably simple alternative I can think of is use .tar files for Binary - with the id as the file name

view this post on Zulip Josh Mandel (Nov 02 2018 at 18:54):

I love the quantitative details there -- good to know we're within a factor of 10-50%.

view this post on Zulip Josh Mandel (Nov 02 2018 at 18:54):

Thanks Michele!


Last updated: Apr 12 2022 at 19:14 UTC