FHIR Chat · Realistic sample data · Covid-19 Response

Stream: Covid-19 Response

Topic: Realistic sample data


view this post on Zulip Michael Donnelly (May 11 2020 at 18:52):

Anyone got a method for generating good looking sample data? MITRE folks maybe? @Reece Adamson

view this post on Zulip John Moehrke (May 11 2020 at 18:56):

shorthand (sushi)

view this post on Zulip Michael Donnelly (May 11 2020 at 18:56):

?

view this post on Zulip John Moehrke (May 11 2020 at 18:58):

I was reacting to one by one.. I presume you are looking for a pseudo population of data

view this post on Zulip Michael Donnelly (May 11 2020 at 18:59):

Yes. I should be more specific. I have a bunch of hospitals and want to generate CDC and FEMA MeasureReport data for them.

view this post on Zulip John Moehrke (May 11 2020 at 19:01):

isn't that what @Gino Canessa is doing?

view this post on Zulip Michael Donnelly (May 11 2020 at 19:02):

Maybe. I know he's making sample resources.

view this post on Zulip Reece Adamson (May 11 2020 at 19:03):

Ya @Gino Canessa has some data generation capabilities here: https://github.com/microsoft-healthcare-madison/learning-spike-erp.

I don't know of anything MITRE side that is SANER specific right now data-wise (we use Synthea for generally creating data)

view this post on Zulip Michael Donnelly (May 11 2020 at 19:03):

I don't have my incoming MeasureReport create going yet, and even when I do I won't want identical data to what others are using. There are definitely ways to munge it up.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:03):

Thanks @John Moehrke and @Reece Adamson

view this post on Zulip Gino Canessa (May 11 2020 at 19:05):

Yes, the goal there is to generate fake-usable data. The generator can be set for creating 'n' days worth of data, but we keep it to 3 for the repo check-in.

view this post on Zulip Gino Canessa (May 11 2020 at 19:07):

I will note that we haven't done anything in particular to make the data realistic. Mostly, it's just random generators with ranges specified by command line args.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:08):

Ah.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:08):

I'll take a look at some real data and work on it a bit then.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:08):

I've had enough times when something looked right with fake data and turned out to be janky with real data.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:09):

(Same reason I like to give sample patients names real people might have.)

view this post on Zulip Gino Canessa (May 11 2020 at 19:09):

Yep. I had planned to do more along those lines but just haven't had the time.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:09):

Let me know if you do.

view this post on Zulip Michael Donnelly (May 11 2020 at 19:10):

If I tackle it, is there any particular way that'd be more useful to you?

view this post on Zulip Gino Canessa (May 11 2020 at 19:10):

Magic 8-ball says: Not likely (before connectathon at least)

view this post on Zulip Gino Canessa (May 11 2020 at 19:11):

Michael Donnelly said:

If I tackle it, is there any particular way that'd be more useful to you?

A PR? :grinning:

If you build something with realistic models, it shouldn't be too hard to add it in. I've tried to keep data generation completely separate from building the FHIR resources.

view this post on Zulip Jason Walonoski (May 11 2020 at 21:36):

Synthea has a covid19 branch that generates patients with covid19. I was hoping to hack some code up during the connectathon that would generate MeasureReports from that synthetic data and post the results to one of the participating servers.

view this post on Zulip Abbie Watson (May 11 2020 at 23:40):

Yeah, we've got something in the works in that regard, also. I'm stitching it together right now.


Last updated: Apr 12 2022 at 19:14 UTC