FHIR Chat · Sept. 2019 Connectathon Track · bulk data

Stream: bulk data

Topic: Sept. 2019 Connectathon Track


view this post on Zulip Dan Gottlieb (Aug 23 2019 at 13:28):

I probably should have posted this to the channel earlier, but there will be a bulk data and analytics track again at the September FHIR Connectathon in Atlanta in a few weeks (http://www.hl7.org/events/working_group_meeting/2019/09/). I posted a track description at https://confluence.hl7.org/display/FHIR/2019-09+Bulk+Data+and+Analytics+Track , but the general idea is to continue prototyping around a FHIR bulk data import operation, and explore portable SQL on FHIR data models and queries.

view this post on Zulip Dan Gottlieb (Aug 23 2019 at 13:29):

Also, I converted our working proposal for an import operation from Josh's Google doc to markdown and posted it at https://github.com/smart-on-fhir/bulk-import/blob/master/import.md to make it easier to propose and track changes.

view this post on Zulip Karen Fairchild (Aug 23 2019 at 21:02):

Is anyone bringing a new/enhanced bulk data export server to the Sept 2019 Connectathon they want to test with a back-end service client? Or, is anyone interested in testing a bulk data server with some new functionality - we would like to do various search paramenters and are looking at adding in a streaming de-id to our client. We have a research database that needs de-id data, and we need to load tons of patients and their data but also want do that incrementally through search parameters (mostly date related).

view this post on Zulip Nick Robison (Sep 05 2019 at 18:43):

Hi folks. Sorry for chiming in late. Our team from CMS will be at the connectathon along with our new bulk data implementation, the Data at the Point of Care application (https://dpc.cms.gov), would love to connect with folks and test our implementation. We'll be able to issue access tokens to our synthetic backend so hopefully we can get some integration working with other EHR and analytics vendors. Feel free to reach out if you have any other questions for me or the rest of the team. You can also take a look at our open source code at: https://github.com/CMSGov/dpc-app.

Looking forward to seeing everyone there!

view this post on Zulip Karen Fairchild (Sep 11 2019 at 18:32):

@Nick Robison Would like to touch bases with you at the Connectathon and try some queries from our research analytics - we are a requestor of data and claims data can help us fill in "gaps" for our customers.

view this post on Zulip Michele Mottini (Sep 13 2019 at 11:31):

Is there a list of test server somewhere?

view this post on Zulip Michele Mottini (Sep 13 2019 at 11:32):

(we probably won't have a bulk server this time - we are rewriting it and it is is not ready...trying...)

view this post on Zulip Karen Fairchild (Sep 13 2019 at 13:16):

@Michele Mottini If you have anything on your bulk data server we're willing to try to out - as well as your regular R4 server. See you there.

view this post on Zulip Dan Gottlieb (Sep 13 2019 at 14:24):

I posted a spreadsheet for the connectathon track at https://docs.google.com/spreadsheets/d/11sHNx7pRHSuSQLRg8iqCDIGBVGiYLjm0FfmzHVPZbRI/edit?usp=sharing . Please sign up there if you plan to participate and include details around your bulk data server or client if applicable! cc: @Michele Mottini

view this post on Zulip Michele Mottini (Sep 13 2019 at 14:29):

Thanks Dan

view this post on Zulip Patrick Cosmo (Sep 14 2019 at 14:43):

GCP FHIR Analytics - STU3

An STU3 FHIR store prepopulated with synthea data and exported to a BigQuery (BQ) SQL data store for Analytics compliant with https://github.com/FHIR/sql-on-fhir.

To access this you will need a Google Cloud Platform (GCP) account: you can get a free 1 year trial:
1. Create a gmail account if you don’t already have one: https://gmail.com
2. Create a free trial GCP account with your gmail account: https://cloud.google.com/gcp/?utm_campaign=na-US-all-en-dr-bkws-all-all-trial-b-dr-100717
3. Get Access: send Patrick Cosmo (pcosmo@google.com) your gmail address so that I can give you access to the BQ dataset.
4. You will then be able to execute queries on BQ from https://console.cloud.google.com/bigquery?project=fhir-connectathon-252520.
For example, see how many patients are in the FHIR Store:

select count(*) from fhir_connectathon.Patient;

Access the GCP FHIR Store (STU3) Behind the Analytics Data

1. To get an access token:
curl -X POST https://pcosmo-eval-prod.apigee.net/oauth2/accesstoken -H 'Accept: application/json' -H 'Authorization: Basic aUx5aUptblNqNm42eElFR0FWR2dsZmtBbEFENVVrRDg6YUFLVzM1eXZITlk1NTJ5Ng==' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -d '{ "grantType" : "client_credentials", "scopes" : "user/*.read" }'

This call will return an access token which you will need to use in the Authorization Bearer header for calls to the FHIR store.

2. FHIR Path: https://pcosmo-eval-prod.apigee.net/v1/hcapi/
For example, retrieve 5 patients:

curl -X GET \
https://pcosmo-eval-prod.apigee.net/v1/hcapi/Patient?_count=5 \
-H 'Accept: application/fhir+json;charset=utf-8' \
-H 'Authorization: Bearer <access token> \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/fhir+json;charset=utf-8'

view this post on Zulip Dan Gottlieb (Sep 14 2019 at 15:23):

I scheduled a bulk import breakout discussion at 3pm today in room "International 7". Please join if you're working on the spec, prototyping a server or client, or are just interested!

view this post on Zulip Adam Culbertson (Sep 14 2019 at 15:44):

Thanks Dan!

view this post on Zulip Jack Liu (Sep 14 2019 at 16:35):

I have provisioned an instance of STU3 FHIR server here: https://atlantacon.azurewebsites.net/. The authentication is turned off on so you should be able to hit the endpoint without having to acquire any token.

The FHIR server supports experimental export functionality. There are some restrictions:

  • Currently we only support system-wide export. We don't support patient compartment or group yet.
  • SAS token to an azure blob storage account is needed for destination. Instead of exporting resources to a file and have the client download the file, we thought it would be better to directly export to the destination of your choice. More detail can be found here: https://github.com/microsoft/fhir-server/blob/master/docs/BulkExport.md. We do have work lined up to support default storage location.

If you want to try the export functionality without having to provision an Azure blob storage, I created a demo account that you can use.

You can start the export process by issuing a GET request to https://atlantacon.azurewebsites.net/$export?_destinationType=azure-block-blob&_destinationConnectionSettings=QmxvYkVuZHBvaW50PWh0dHBzOi8vYXRsYW50YWNvbi5ibG9iLmNvcmUud2luZG93cy5uZXQvO1F1ZXVlRW5kcG9pbnQ9aHR0cHM6Ly9hdGxhbnRhY29uLnF1ZXVlLmNvcmUud2luZG93cy5uZXQvO0ZpbGVFbmRwb2ludD1odHRwczovL2F0bGFudGFjb24uZmlsZS5jb3JlLndpbmRvd3MubmV0LztUYWJsZUVuZHBvaW50PWh0dHBzOi8vYXRsYW50YWNvbi50YWJsZS5jb3JlLndpbmRvd3MubmV0LztTaGFyZWRBY2Nlc3NTaWduYXR1cmU9c3Y9MjAxOC0wMy0yOCZzcz1iZnF0JnNydD1zY28mc3A9cndkbGFjdXAmc2U9MjAxOS0wOS0xOFQyMzoxNjoyOFomc3Q9MjAxOS0wOS0xNFQxNToxNjoyOFomc3ByPWh0dHBzJnNpZz1qV3g3RU5JZnhQR0dVbk45MjRRTU0wYmRsQmE5eiUyRjNtMWJCTG9QWFltYmslM0Q=. The storage account is available until Wednesday.

In the response, you should get a link to the job instance:

HTTP/1.1 202 Accepted
Content-Location: https://atlantacon.azurewebsites.net/_operations/export/57de470c-7222-4566-8184-d1467bbd4916
Server: Kestrel
Request-Context: appId=cid-v1:710b9af1-bf8e-4016-aa56-08ff8b72f79b
X-Request-Id: 2051bd9f-9d85-487f-94ac-372f2eaff18b
x-ms-session-token: 8:-1#6713963
x-ms-request-charge: 9.71
X-Content-Type-Options: nosniff
X-Powered-By: ASP.NET
Date: Sat, 14 Sep 2019 15:49:47 GMT
Content-Length: 0

You can then issue a GET request to the job instance URL to get the status of the job. While the job is executing, you will get 202 Accepted status back.

HTTP/1.1 202 Accepted
Server: Kestrel
Request-Context: appId=cid-v1:710b9af1-bf8e-4016-aa56-08ff8b72f79b
X-Request-Id: 3d7ed6c1-665b-4130-9d03-b04046a3fe39
x-ms-session-token: 8:-1#6713966
x-ms-request-charge: 1.05
X-Content-Type-Options: nosniff
X-Powered-By: ASP.NET
Date: Sat, 14 Sep 2019 15:50:56 GMT
Content-Length: 0

The FHIR server contains about 33 million resources so it will take a long time to finish export. I recommend let the job run for a few minutes and cancel it by issuing a DELETE request to the job instance URL.

One the job is canceled, you can issue a GET request to the job instance URL again and you should get 206 Partial Content back with the links to the files.

HTTP/1.1 206 Partial Content
Content-Type: application/json; charset=utf-8
Server: Kestrel
Request-Context: appId=cid-v1:710b9af1-bf8e-4016-aa56-08ff8b72f79b
X-Request-Id: d5b6679d-1a35-481f-95b4-1aac09a04059
x-ms-session-token: 8:-1#6713974
x-ms-request-charge: 1.05
X-Content-Type-Options: nosniff
X-Powered-By: ASP.NET
Date: Sat, 14 Sep 2019 16:21:46 GMT
Content-Length: 709

{
"transactionTime": "2019-09-14T15:49:47.3441066+00:00",
"request": "https://atlantacon.azurewebsites.net:443/$export",
"requiresAccessToken": false,
"output": [{
"type": "AllergyIntolerance",
"url": "https://atlantacon.blob.core.windows.net:443/57de470c-7222-4566-8184-d1467bbd4916/AllergyIntolerance.ndjson",
"sequence": 0,
"count": 960,
"committedBytes": 503432
}, {
"type": "CarePlan",
"url": "https://atlantacon.blob.core.windows.net:443/57de470c-7222-4566-8184-d1467bbd4916/CarePlan.ndjson",
"sequence": 0,
"count": 1007,
"committedBytes": 1079683
}, {
"type": "Claim",
"url": "https://atlantacon.blob.core.windows.net:443/57de470c-7222-4566-8184-d1467bbd4916/Claim.ndjson",
"sequence": 0,
"count": 1033,
"committedBytes": 1054528
}
],
"error": []
}

Let me know if you have questions or encounter issues.

view this post on Zulip Michele Mottini (Sep 14 2019 at 17:31):

Our client actually process the resources - importing them in our database - and we cannot do that in any reasonable time for 33 million resources, would it be possible to have a smaller test set? (like some 100's of resources)

view this post on Zulip Michele Mottini (Sep 14 2019 at 17:32):

@Jack Liu ^

view this post on Zulip Jack Liu (Sep 14 2019 at 17:40):

sure let me create another instance with smaller data set.

view this post on Zulip Jack Liu (Sep 14 2019 at 19:01):

Sorry got side-tracked. Here the URL to another instance with small samples: https://atlantacon2.azurewebsites.net.

view this post on Zulip Michele Mottini (Sep 14 2019 at 19:07):

@Dan Gottlieb where is the break out?

view this post on Zulip Josh Mandel (Sep 14 2019 at 19:07):

Basement!

view this post on Zulip Josh Mandel (Sep 14 2019 at 19:07):

International 7; I think you need an escalator from the Marquis level.

view this post on Zulip Michele Mottini (Sep 14 2019 at 19:44):

@Jack Liu : export completed, but I get an error trying to download the data files:
Unable to download 'https://atlantacon.blob.core.windows.net:443/602b8901-8575-4b18-978d-baf8c939f8fc/Condition.ndjson': <?xml version="1.0" encoding="utf-8"?><Error><Code>ResourceNotFound</Code><Message>The specified resource does not exist.
RequestId:56018136-401e-00c5-0334-6b0653000000
Time:2019-09-14T19:41:42.1785641Z</Message></Error>

view this post on Zulip Jack Liu (Sep 14 2019 at 21:19):

@Michele Mottini Fixed the permission issue and also fixed the issue with references. Can you try again and see?

view this post on Zulip Michele Mottini (Sep 14 2019 at 21:22):

Trying

view this post on Zulip Michele Mottini (Sep 14 2019 at 21:24):

mostly good - only some bad Goals:
Processed 24 Patient, 451 Encounter, 122 Condition, 1198 Observation, 73 DiagnosticReport, 84 Medication, 97 MedicationRequest, 35 Goal
Errors:
'The needed Goal field 'subject' is missing'
35 occurences, first in 'Goal3.ndjson' (original URL 'https://atlantacon.blob.core.windows.net:443/8e44a38e-3177-4ed7-9dd4-c3abd0d5276a/Goal.ndjson') at offset 0
JSON: '{"resourceType":"Goal","id":"a4d6c328-3738-4709-9b3b-769af498a328","meta":{"versionId":"1","lastUpdated":"2019-09-14T21:06:15.319+00:00"},"status":"in-progress","description":{"text":"Hemoglobin A1c total in Blood < 7.0"},"addresses":[{"reference":"Condition/4aca9249-95bb-4f88-840a-af4f0c0aaae5"}]}'

view this post on Zulip Michele Mottini (Sep 14 2019 at 21:29):

:+1:

view this post on Zulip Jack Liu (Sep 14 2019 at 21:29):

Looks like it's an issue with the synthea data itself. It doesn't contain the subject to start with.

{ "fullUrl": "urn:uuid:d1ab5bc2-8fcb-49a5-8cb8-ad5f9ae1a64d", "resource": { "id": "d1ab5bc2-8fcb-49a5-8cb8-ad5f9ae1a64d", "status": "in-progress", "description": { "text": "Glucose [Mass/volume] in Blood < 108" }, "addresses": [ { "reference": "urn:uuid:1742d8d7-a30a-4024-889e-5b32b8229fee" } ], "resourceType": "Goal" } }

view this post on Zulip Michele Mottini (Sep 14 2019 at 21:30):

It's a Goal for none!

view this post on Zulip John Moehrke (Sep 15 2019 at 13:06):

is this track creating any AuditEvent or Provenance records? I am working on an app that consumes AuditEvent and would like to find some.

view this post on Zulip Josh Mandel (Sep 15 2019 at 13:08):

Not as a formal expectation of any scenarios, but individual participants might be.

view this post on Zulip John Moehrke (Sep 15 2019 at 13:11):

right. I am fine with volunteer audit events... while I am very much willing to help define a set of recommended log entries, But first, just want to see some AuditEvents in the wild

view this post on Zulip Dan Gottlieb (Sep 15 2019 at 14:11):

If you're participating in the bulk data track, please add a few sentences about what you worked on, what you learned, any open questions, and your plans for next steps to our report out to the rest of the connecathon by 1pm. Thanks!

view this post on Zulip Dan Gottlieb (Sep 15 2019 at 14:11):

https://docs.google.com/document/d/1XxbRnExqhdzbEVsvdIw7BTq46-0wJ7SbSMdmy8HKx3g/edit?usp=sharing

view this post on Zulip Sam Sayer (Sep 15 2019 at 14:53):

Will there be another bulk data breakout today? I thought that was mentioned at the end of the session yesterday.

view this post on Zulip Dan Gottlieb (Sep 15 2019 at 15:41):

@Sam Sayer There's nothing planned for today, but lots of discussion going on at the track tables (around #7 in the front of the room) that folks are welcome to join!


Last updated: Apr 12 2022 at 19:14 UTC