Stream: Covid-19 Response
Topic: Realistic sample data
Michael Donnelly (May 11 2020 at 18:52):
Anyone got a method for generating good looking sample data? MITRE folks maybe? @Reece Adamson
John Moehrke (May 11 2020 at 18:56):
shorthand (sushi)
Michael Donnelly (May 11 2020 at 18:56):
?
John Moehrke (May 11 2020 at 18:58):
I was reacting to one by one.. I presume you are looking for a pseudo population of data
Michael Donnelly (May 11 2020 at 18:59):
Yes. I should be more specific. I have a bunch of hospitals and want to generate CDC and FEMA MeasureReport data for them.
John Moehrke (May 11 2020 at 19:01):
isn't that what @Gino Canessa is doing?
Michael Donnelly (May 11 2020 at 19:02):
Maybe. I know he's making sample resources.
Reece Adamson (May 11 2020 at 19:03):
Ya @Gino Canessa has some data generation capabilities here: https://github.com/microsoft-healthcare-madison/learning-spike-erp.
I don't know of anything MITRE side that is SANER specific right now data-wise (we use Synthea for generally creating data)
Michael Donnelly (May 11 2020 at 19:03):
I don't have my incoming MeasureReport create going yet, and even when I do I won't want identical data to what others are using. There are definitely ways to munge it up.
Michael Donnelly (May 11 2020 at 19:03):
Thanks @John Moehrke and @Reece Adamson
Gino Canessa (May 11 2020 at 19:05):
Yes, the goal there is to generate fake-usable data. The generator can be set for creating 'n' days worth of data, but we keep it to 3 for the repo check-in.
Gino Canessa (May 11 2020 at 19:07):
I will note that we haven't done anything in particular to make the data realistic. Mostly, it's just random generators with ranges specified by command line args.
Michael Donnelly (May 11 2020 at 19:08):
Ah.
Michael Donnelly (May 11 2020 at 19:08):
I'll take a look at some real data and work on it a bit then.
Michael Donnelly (May 11 2020 at 19:08):
I've had enough times when something looked right with fake data and turned out to be janky with real data.
Michael Donnelly (May 11 2020 at 19:09):
(Same reason I like to give sample patients names real people might have.)
Gino Canessa (May 11 2020 at 19:09):
Yep. I had planned to do more along those lines but just haven't had the time.
Michael Donnelly (May 11 2020 at 19:09):
Let me know if you do.
Michael Donnelly (May 11 2020 at 19:10):
If I tackle it, is there any particular way that'd be more useful to you?
Gino Canessa (May 11 2020 at 19:10):
Magic 8-ball says: Not likely (before connectathon at least)
Gino Canessa (May 11 2020 at 19:11):
Michael Donnelly said:
If I tackle it, is there any particular way that'd be more useful to you?
A PR? :grinning:
If you build something with realistic models, it shouldn't be too hard to add it in. I've tried to keep data generation completely separate from building the FHIR resources.
Jason Walonoski (May 11 2020 at 21:36):
Synthea has a covid19
branch that generates patients with covid19. I was hoping to hack some code up during the connectathon that would generate MeasureReports from that synthetic data and post the results to one of the participating servers.
Abbie Watson (May 11 2020 at 23:40):
Yeah, we've got something in the works in that regard, also. I'm stitching it together right now.
Last updated: Apr 12 2022 at 19:14 UTC