FHIR Chat · Sample data · bulk data

Stream: bulk data

Topic: Sample data


view this post on Zulip Toby Hu (Apr 26 2019 at 20:38):

For participants to the Bulk Data and Analytics track in during the Montreal Connectathon, a Google cloud project has been created to host a small and a large FHIR demo dataset. The data is generated using Synthea (https://github.com/synthetichealth/synthea), which is an open-source project under Apache License, Version 2.0, owned by MITRE Corporation. To access the dataset, please join the access group https://groups.google.com/forum/#!forum/gcp-fhir-demo-dataset-readonly. You then can find the data files at https://console.cloud.google.com/storage/browser/gcp-fhir-demo-dataset-synthea.

view this post on Zulip Josh Mandel (Apr 29 2019 at 18:24):

Thanks @Toby Hu ! For anyone who wants to access this content (open to public) via Azure:

Files are available via https://synthea2019.blob.core.windows.net/synthea-may-2019 + path, as in https://synthea2019.blob.core.windows.net/synthea-may-2019/r4-small/Condition.ndjson

/
├── /r4
│   ├── /r4/AllergyIntolerance.ndjson
│   ├── /r4/CarePlan.ndjson
│   ├── /r4/Claim.ndjson
│   ├── /r4/Condition.ndjson
│   ├── /r4/DiagnosticReport.ndjson
│   ├── /r4/Encounter.ndjson
│   ├── /r4/ExplanationOfBenefit.ndjson
│   ├── /r4/Goal.ndjson
│   ├── /r4/ImagingStudy.ndjson
│   ├── /r4/Immunization.ndjson
│   ├── /r4/MedicationRequest.ndjson
│   ├── /r4/Observation.ndjson
│   ├── /r4/Organization.ndjson
│   ├── /r4/Patient.ndjson
│   ├── /r4/Practitioner.ndjson
│   ├── /r4/Procedure.ndjson
│   ├── /r4/hospitalInformation1555536562902.json
│   └── /r4/practitionerInformation1555536562902.json
├── /r4-small
│   ├── /r4-small/AllergyIntolerance.ndjson
│   ├── /r4-small/CarePlan.ndjson
│   ├── /r4-small/Claim.ndjson
│   ├── /r4-small/Condition.ndjson
│   ├── /r4-small/DiagnosticReport.ndjson
│   ├── /r4-small/Encounter.ndjson
│   ├── /r4-small/ExplanationOfBenefit.ndjson
│   ├── /r4-small/Goal.ndjson
│   ├── /r4-small/ImagingStudy.ndjson
│   ├── /r4-small/Immunization.ndjson
│   ├── /r4-small/MedicationRequest.ndjson
│   ├── /r4-small/Observation.ndjson
│   ├── /r4-small/Organization.ndjson
│   ├── /r4-small/Patient.ndjson
│   ├── /r4-small/Practitioner.ndjson
│   ├── /r4-small/Procedure.ndjson
│   ├── /r4-small/hospitalInformation1555535281086.json
│   └── /r4-small/practitionerInformation1555535281086.json
├── /stu3
│   ├── /stu3/AllergyIntolerance.ndjson
│   ├── /stu3/CarePlan.ndjson
│   ├── /stu3/Claim.ndjson
│   ├── /stu3/Condition.ndjson
│   ├── /stu3/DiagnosticReport.ndjson
│   ├── /stu3/Encounter.ndjson
│   ├── /stu3/ExplanationOfBenefit.ndjson
│   ├── /stu3/Goal.ndjson
│   ├── /stu3/ImagingStudy.ndjson
│   ├── /stu3/Immunization.ndjson
│   ├── /stu3/MedicationRequest.ndjson
│   ├── /stu3/Observation.ndjson
│   ├── /stu3/Organization.ndjson
│   ├── /stu3/Patient.ndjson
│   ├── /stu3/Practitioner.ndjson
│   └── /stu3/Procedure.ndjson
└── /stu3-small
    ├── /stu3-small/AllergyIntolerance.ndjson
    ├── /stu3-small/CarePlan.ndjson
    ├── /stu3-small/Claim.ndjson
    ├── /stu3-small/Condition.ndjson
    ├── /stu3-small/DiagnosticReport.ndjson
    ├── /stu3-small/Encounter.ndjson
    ├── /stu3-small/ExplanationOfBenefit.ndjson
    ├── /stu3-small/Goal.ndjson
    ├── /stu3-small/ImagingStudy.ndjson
    ├── /stu3-small/Immunization.ndjson
    ├── /stu3-small/MedicationRequest.ndjson
    ├── /stu3-small/Observation.ndjson
    ├── /stu3-small/Organization.ndjson
    ├── /stu3-small/Patient.ndjson
    ├── /stu3-small/Practitioner.ndjson
    └── /stu3-small/Procedure.ndjson

4 directories, 68 files

view this post on Zulip Jie Fan (May 04 2019 at 14:05):

There is an issue with the synthetic dataset (references are invalid), we've cleaned most of the data, and are trying to export the data to BigQuery, I'll post an update after it is done

view this post on Zulip Jason Walonoski (May 04 2019 at 14:12):

Are the invalid references from Synthea itself, or are they a result of some post-processing or repackaging? If the former, please file a Synthea issue so I can fix it. Thanks.

view this post on Zulip Jie Fan (May 04 2019 at 14:13):

It's from Synthea, I'll file a bug in a minute.

view this post on Zulip Nik Klassen (May 04 2019 at 14:41):

The sample data can also be queried directly in BigQuery at https://console.cloud.google.com/bigquery?project=gcp-fhir-demo-dataset (accessible once the group has been joined)

view this post on Zulip Josh Mandel (May 04 2019 at 14:42):

(Do these sample data in bq currently have the same bug that @Jie Fan mentioned ?)

view this post on Zulip Jie Fan (May 04 2019 at 14:43):

No, but we forgot to export the data in analytics schema, @Benard Ebinu is re-exporting the data

view this post on Zulip Jie Fan (May 04 2019 at 14:44):

@Benard Ebinu will post an update once that's done.

view this post on Zulip Jie Fan (May 04 2019 at 15:01):

FYI, the cleaned stu3 small dataset is here: https://pantheon.corp.google.com/storage/browser/gcp-fhir-demo-dataset-synthea/stu3-small-cleaned, let me know if you encounter any issues using it.

view this post on Zulip Jie Fan (May 04 2019 at 15:03):

(Please join https://groups.google.com/forum/#!forum/gcp-fhir-demo-dataset-readonly to use the dataset and access BigQuery for data analytics)

view this post on Zulip Brian Wright (May 04 2019 at 15:16):

I am getting a google SSO login prompt trying to access this.

view this post on Zulip Brian Wright (May 04 2019 at 15:16):

Specifically this link: https://pantheon.corp.google.com/storage/browser/gcp-fhir-demo-dataset-synthea/stu3-small-cleaned,

view this post on Zulip Dan Gottlieb (May 04 2019 at 15:21):

Yep - same issue. @Jie Fan can you open it to non-google folks?

view this post on Zulip Nik Klassen (May 04 2019 at 15:24):

That link should be https://console.cloud.google.com/storage/browser/gcp-fhir-demo-dataset-synthea/stu3-small-cleaned

view this post on Zulip Jie Fan (May 04 2019 at 15:25):

Thanks Nik, this should be the correct link.

view this post on Zulip Dan Gottlieb (May 04 2019 at 15:29):

Is the cleaned data also accessible through https://console.cloud.google.com/bigquery?project=gcp-fhir-demo-dataset or is this the previous version?

view this post on Zulip Benard Ebinu (May 04 2019 at 15:36):

The new cleaned data is now accessible through https://console.cloud.google.com/bigquery?project=gcp-fhir-demo-dataset under dataset stu3_small

view this post on Zulip Jie Fan (May 04 2019 at 15:40):

Here are references to the standard SQL for BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/

view this post on Zulip Jie Fan (May 04 2019 at 15:40):

Let one of us know if you have issues querying the data :)

view this post on Zulip Jie Fan (May 04 2019 at 15:59):

There is an issue where the resource ids are omitted in the BigQuery table, we are fixing it.

view this post on Zulip Josh Mandel (May 04 2019 at 16:08):

I also updated https://gist.github.com/jmandel/fd9683f11c9bc3eeb2316f017c35ddac with links to the "fixed" files in Azure.

view this post on Zulip Jason Walonoski (May 04 2019 at 16:11):

Posted bug report: https://github.com/synthetichealth/synthea/issues/513

view this post on Zulip Jason Walonoski (May 04 2019 at 19:44):

I fixed this bug in Synthea proper. Thanks for finding it @Jie Fan

view this post on Zulip Jie Fan (May 04 2019 at 20:17):

Great thank you! Credit actually goes to @Toby Hu :)

view this post on Zulip Dan Gottlieb (May 04 2019 at 21:37):

@Jie Fan are you rebuilding the data - I don't see any tables in bigquery under stu3_small anymore?

view this post on Zulip Jie Fan (May 04 2019 at 22:37):

@Benard Ebinu is trying to re-generate the source data since Jason fixed the bug, Benard, did you delete the tables for backup them somewhere?

view this post on Zulip Dan Gottlieb (May 05 2019 at 12:28):

Thanks - looks like the tables are back now! Were you able to add the identifier field to patient (I don't see a .id in the resource)?

view this post on Zulip Jie Fan (May 05 2019 at 13:43):

gcp-fhir-demo-dataset:stu3_sql_schema has resources exported based on the lossless schema.

view this post on Zulip Brian Wright (May 05 2019 at 16:13):

Thanks. I was able to complete my scenario using the stu3_sql_schema data in BigQuery.

view this post on Zulip Jie Fan (May 05 2019 at 16:25):

:+1:


Last updated: Apr 12 2022 at 19:14 UTC