Stream: Covid-19 Response
Topic: Missing data
Farzad Mostashari (Mar 12 2020 at 17:38):
took me a bit to find the stream, but I'm here
Farzad Mostashari (Mar 12 2020 at 17:40):
First framing question- the public facing part of this thing (website where you could get your calculated risk) could be very popular- under whose aegis should it be held?
Farzad Mostashari (Mar 12 2020 at 17:41):
My original intent was to develop a rough and ready batch scoring for providers to use to prioritize outreach
Farzad Mostashari (Mar 12 2020 at 17:41):
the twitter thread also got quickly confused with inpatient mortality risk scoring (and fears of death panels)
Farzad Mostashari (Mar 12 2020 at 17:42):
which is not what Im talking about
Josh Mandel (Mar 12 2020 at 17:43):
Re: public banner, I have no feelings about this. I think we'd do well to host the content on github for now so as we update code/docs the app can be automatically updated too.
Farzad Mostashari (Mar 12 2020 at 17:44):
The other use-case is for screening suspected cases: https://www.medrxiv.org/content/10.1101/2020.03.05.20031906v1 also not the intent
Josh Mandel (Mar 12 2020 at 17:44):
So if you're happy with it, we can host at https://smart-on-fhir.github.io/covid19-risk -- but to be clear, I'm happy to alias something at Aledade (it's pretty straightforward to update the domains file)
Josh Mandel (Mar 12 2020 at 17:45):
Batch scoring is interesting -- it's harder from from the "easy/automatic to integrate" side because we don't have ready standardized access to population APIs (not for a couple of years, given this week's regs)
Farzad Mostashari (Mar 12 2020 at 17:45):
Not concerned about the code/docs at github, more about the public-facing governance
Josh Mandel (Mar 12 2020 at 17:46):
Gotcha. I was thinking about public-facing (prototype?) app hosting, just making sure people can see/try what's there.
Farzad Mostashari (Mar 12 2020 at 17:46):
it will be the new "snowdays.com"
Josh Mandel (Mar 12 2020 at 17:47):
I definitely had to google that.
Josh Mandel (Mar 12 2020 at 17:52):
So @Farzad Mostashari can we start with a kind of spreadsheet model of inputs / outputs?
Josh Mandel (Mar 12 2020 at 17:53):
I've made a template here.
Josh Mandel (Mar 12 2020 at 17:56):
Can we turn the following into something plausible?
Arien Malec (Mar 12 2020 at 18:01):
To be clear:
Scope - Risk scoring for patients with FLIs suspected of Covid-19
Inputs - Demographics, Dx history, other risk factors
Output - categorical risk score
Correct?
Arien Malec (Mar 12 2020 at 18:07):
Sample inputs and illustrative output:
5 yo, non-smoker, no relevant risk factors => low
49yo male, non-smoker, Leukemia => moderate (I hope)
65yo female, smoker, diabetes => high
82yo male, smoker, COPD, CVD => very high
Josh Mandel (Mar 12 2020 at 18:08):
That matches my expectations for the kinds of inputs and outputs. @Farzad Mostashari ?
Arien Malec (Mar 12 2020 at 18:18):
Proposal: Use the crude CFR as a proxy for risk.
e.g.
https://ourworldindata.org/coronavirus#the-definition-of-the-case-fatality-rate-cfr
Unfortunately, the two axes (age and co-mobidities) are highly entangled, but the crudest possible risk scoring methodology would use age as the primary driver, & co-morbidity(ies) as the secondary risk factor.
Arien Malec (Mar 12 2020 at 18:18):
If only we had an epidemiologist on call?
Arien Malec (Mar 12 2020 at 18:28):
A more sophisticated approach would look at the individual contributions of co-morbidity risk to age based on the base rates for each age category.
Arien Malec (Mar 12 2020 at 19:00):
I did a crude V1 based on the CDC China risk factors and assuming independent contribution, but everything parameterized via lookups so easy to change.
Farzad Mostashari (Mar 12 2020 at 19:17):
If we use CFR, then would need some estimate of the baseline prevalence of the conditions (and ideally their cross correlations) within the exposed population. So, if cancer is 5x as likely in the elderly, but only associated with a 50% increase in CFR, then maybe its protective.
Farzad Mostashari (Mar 12 2020 at 19:18):
The other source of data would be where we know the universe of the exposed population- cruise ships, and contact investigations. Look at those severely ill among the total exposed/ infected
Farzad Mostashari (Mar 12 2020 at 19:19):
(...and I'm trying to run a company on the side, damnit)
Arien Malec (Mar 12 2020 at 19:52):
Yes, the V1 is ultra-crude and can easily lead to adjusted CFR > 100%. The CDC data was based on Wuhan outbreak, so need Wuhan dx prevalence. At the same time the vectors are pointing in the right direction, modulo unknowns.
Arien Malec (Mar 12 2020 at 19:56):
Because all the comoribidities in the CDC Wuhan data show 6x-11x relative risk, there's no way any of them are protective, but we don't know the effect on multiple morbidity.
As an example, the crude model shows a 49 year old with All The Diseases has higher risk than an 89yo person with no health issues. Which factor drives more? :shrug:♂️
Josh Mandel (Mar 12 2020 at 19:58):
So @Farzad Mostashari, is your goal to run these risk calculations speculatively across your ambulatory population of not-yet-sick folks, to see who's at risk for the worst outcomes if they get sick?
Arien Malec (Mar 12 2020 at 20:04):
Chinese diabetes prevalence is in line with US, just running slightly behind (11% total pop, ~20% for older Chinese)
https://jamanetwork.com/journals/jama/fullarticle/2633917
So the increase in risk for T1/T2DM is at least partly orthogonal to age.
Same trend for HTN.
https://www.ahajournals.org/doi/suppl/10.1161/CIRCULATIONAHA.117.032380
Josh Mandel (Mar 12 2020 at 20:11):
Can you help me understand where the base rates come into play? Without source data from outbreaks with rows per-case listing (age, DM?, HTN?, died?)
, how do you try to figure out how much each factor contributes? e.g., even for something like a simple regression...
Arien Malec (Mar 12 2020 at 20:38):
Sure -- data is from here:
https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf
I used the primary CFR based on the China CDC data represented in the URL I linked to above.
The co-morbidity adjuster was done as a multiplier off base rate for age, based on a multiplier using the overall the base CFR and the CFR for individuals with the selected co-morbidity and treating this as orthogonal. This is obviously wrong, but directionally correct?
It would be better to use Bayes' Rule or some such, but I opted for a crude V1.
Arien Malec (Mar 12 2020 at 20:45):
Sorry, data is not from the WHO report (amusingly, this was originally autocorrected to The Who) but from the "ourworldindata" sit, supposedly abstracted from China CDC data.
I'm trying to see if I can find the sources for the raw data.
Josh Mandel (Mar 12 2020 at 20:46):
Ha, thanks -- I had just scrolled through and wondered what I was missing.
Arien Malec (Mar 12 2020 at 20:47):
Here's the actual source:
http://weekly.chinacdc.cn/en/article/id/e53946e2-c6c4-41e9-9a9b-fea8db1a8f51
Josh Mandel (Mar 12 2020 at 20:47):
OK cool, so https://ourworldindata.org/coronavirus came from http://weekly.chinacdc.cn/en/article/id/e53946e2-c6c4-41e9-9a9b-fea8db1a8f51 ...
Arien Malec (Mar 12 2020 at 20:48):
If you want to Bayes'ize it, go for it.
you'll have to adjust the CFR adjusted risk ranges as well.
Arien Malec (Mar 12 2020 at 20:50):
As I said, I did the super crude thing: the crude CFR for people >= 80 is 14.8, and the relative risk for people with cancer over the base rate of no comorbidity is 6.2x, so …
Arien Malec (Mar 12 2020 at 20:51):
@Farzad Mostashari is doing a nice job of nerd sniping lazy webbing.
Josh Mandel (Mar 12 2020 at 20:52):
Oh, so far I'm just trying to follow the simple thing you did -- which I think doesn't involve knowing things like "base rate of diabetes in the Wuhan population," but rather just involves taking 6.3 / 0.9 to figure out the COPD risk from:
Josh Mandel (Mar 12 2020 at 20:53):
I don't know how you'd estimate joint probability distributions when you only have data about one variable at a time.
Farzad Mostashari (Mar 12 2020 at 20:54):
Josh Mandel said:
So Farzad Mostashari, is your goal to run these risk calculations speculatively across your ambulatory population of not-yet-sick folks, to see who's at risk for the worst outcomes if they get sick?
yes
Josh Mandel (Mar 12 2020 at 20:56):
Just annotated the risk factor tab with source details
Josh Mandel (Mar 12 2020 at 20:58):
The model maybe needs to account for sex given:
Case fatality rate for males was 2.8% and for females was 1.7%.
... but there's not enough source data to make sense of this without knowing age/sex/HTN interactions at baseline.
Arien Malec (Mar 12 2020 at 21:18):
I made the unsophisticated modeling assumption that this is noise.
Josh Mandel (Mar 12 2020 at 21:19):
65% would be a lot of noise!
Arien Malec (Mar 12 2020 at 21:19):
And yes, the simplest version was to divide the HTN rate by the base rate.
Arien Malec (Mar 12 2020 at 21:20):
I don't believe small RRs in epidemiology based data.
Josh Mandel (Mar 12 2020 at 21:21):
I can't tell if you're saying something snarky or sincere, so forgive my plodding repetition here: how is 2.8/1.7==1.64 a small ratio?
Josh Mandel (Mar 12 2020 at 21:23):
You're just saying, compared with numbers like 10x?
Arien Malec (Mar 12 2020 at 21:23):
yes.
Arien Malec (Mar 13 2020 at 01:12):
(The other reason to suppose sex-based differences are noise is that the mechanism by which being female could be protective aren't clear, and aren't consistent with other similar diseases -- the commentary I've seen is on the order of "something something estrogen <waves hand> something"
But @Farzad Mostashari is the customer here...
Josh Mandel (Mar 13 2020 at 14:06):
Anyone have a gloss on the final row of Table 1
Josh Mandel (Mar 13 2020 at 14:06):
By a good margin, the biggest group was the "no data about comorbidities" group -- and they had dramatically the highest case fatality rate.
Josh Mandel (Mar 13 2020 at 14:07):
I guess one explanation is just "when we're moving so fast that we're not recording data, it's usually because we're in count-the-bodies mode". But want to make sure I'm not missing something more subtle here.
Arien Malec (Mar 13 2020 at 14:52):
I think that's right. They didn't have the highest CFR -- the unknown CFR was 2.6 relative to 10.5
Arien Malec (Mar 13 2020 at 14:53):
10.5 for CVD.
Josh Mandel (Mar 13 2020 at 15:06):
D'oh, thanks (I was misreading the % of deaths)
Arien Malec (Mar 13 2020 at 15:51):
https://cmmid.github.io/topics/covid19/severity/diamond_cruise_cfr_estimates.html
On Diamond Princess
From everything I can gather, the Wuhan data is still being treated as the gold standard.
Last updated: Apr 12 2022 at 19:14 UTC