Stream: bulk data
Topic: Feedback on $import input specification
Paul Church (Aug 27 2019 at 18:18):
Some feedback on the current import draft, specifically around the input fields:
-
inputFormat: The main distinction GCP FHIR import makes on its input format is whether the input is resources, or resources-inside-bundles. This is important because the client might want to persist bundles rather than unwrap them, and in isolation it's not always obvious which was intended. Not sure if this belongs in inputFormat or some new parameter like mode. (We also support JSON that isn't ndjson, with one resource/bundle per file - but that's captured under "may support additional formats"; we would make it application/fhir+json.)
-
inputSource: Does this need to be required? I've seen a lot of cases like Synthea or ETL'd data where there isn't a FHIR server source. I think this field should be optional.
-
input.type: Limiting an input object to a single resource type is problematic - it aligns with $export but there are many other ways data could be organized. Is this a limitation that server implementers actively want?
-
input.url: Our experience has been that wildcards are very useful. Input data ranges from one big file up to a million patient bundles in individual files. Listing files one by one has never been useful; there's always a wildcard spec that would get the job done. We support
*
for matching, and**
for subdirectory recursion but that's specific to GCS bucket APIs and may not be easily interoperable. However, I think some common level of wildcard support would be useful.
Last updated: Apr 12 2022 at 19:14 UTC