Stream: research
Topic: clinicaltrials.gov
Dan Connolly (Jun 12 2019 at 18:40):
any work on https://clinicaltrials.gov/ studies? graphql via FHIR, for example?
Dan Connolly (Jun 12 2019 at 18:49):
ah: https://www.ncbi.nlm.nih.gov/pubmed/30147028 Investigating the Capabilities of FHIR Search for Clinical Trial Phenotyping 2018
Dan Connolly (Jun 12 2019 at 18:50):
"Using a randomly sampled subset of 303 eligibility criteria from ClinicalTrials.gov yielded a 34 % success rate in representing them using the FHIR search semantics."
Simone Heckmann (Jul 11 2019 at 10:19):
Does anyone know whether there is any interest on the part of clinicaltrials.gov to make eligibility criteria machine readable for FHIR?
Gustav Vella (Jul 19 2019 at 08:08):
Does anyone know whether there is any interest on the part of clinicaltrials.gov to make eligibility criteria machine readable for FHIR?
There's a team there working on outcomes reporting - a year or two ago MR I/E was being discussed. I'll follow up and get back to you on this.
Gustav Vella (Jul 21 2019 at 14:06):
There's a team there working on outcomes reporting - a year or two ago MR I/E was being discussed. I'll follow up and get back to you on this.
@Simone Heckmann Sent you the contact details privately sind they are not on zulip.
Geoff Low (Aug 08 2019 at 15:41):
There are a bunch of companies who offer services in this vein. The issue is that currently it's just a text blob, with some agreed conventions. I just did some simple parsing in https://github.com/glow-mdsol/clinical_trials/blob/develop/clinical_trials/helpers.py#L4 to partition the elements based on paragraphing but it's nowhere near what I think you're talking about.
Geoff Low (Aug 08 2019 at 18:32):
The CTTI database rolls it all into a single column (varchar(5000) or thereabouts), so I'd recommend starting with that data source (as many ML/AI providers are using that as a EDA source)
Gustav Vella (Aug 11 2019 at 18:13):
@Geoff Low viz. parsing: Same approach here tokenizing with support of a term server. Some observations:
- Deducting the positive/negative phrasing of I/E is harder than you'd expect. Alot of protocols are badly written. But it's also that that data does not contain the full protocol.
- Also, you can have quite some biomarker information in the trial endpoint descriptions. The scores there also shed light an the collective. But mostly, if you are after "what data to extract" for the survey / evaluation the endpoint info is indispensable.
All things considered, this is all pre-processing. There's no way around re-structuring the stuff manually and the world would be a better place if clinical researchers had the tools and means to structure and validate this stuff upfront.
Geoff Low (Aug 11 2019 at 21:26):
Oh, I know exactly how difficult it is. In PhUSE we spent around 6 months looking at modelling an ontology to represent the Eligibility Criteria with little really to show for it. The IMI initiative spent quite a bit of time on it and put together a model from recollection, but like you say the way the criteria are phrased makes it really difficult to get to the bottom of what something more than "Male, Age >= 18". The sooner we can get medical writers to build criteria using lego blocks the better for all parties. Some studies are now reporting the full protocol (and ICF) in Clinical Trials - I added a facet to the python project to automatically pull the document if it's declared.
I also agree also about the endpoints, however the use case most people linger on is "given this clinical trial protocol document, can you tell me whether I will be able to recruit enough suitable candidates from a pool of patients" and if a machine could do that it would be super. There's such a large amount of time (and money) spent on getting patients onto a study that it's a really viable market for multiple companies to do just that (horrendous over simplification of course, many of these orgs spend considerable resources accessing, cleaning and indexing patient data).
When we get to the point that eligibility can be described in such a way that machine readability is the rule then we can move on to having subjects self identify for research using their personal health record, or use CDS hooks to streamline the process in an EHR system. Prior studies have found that in general people are quite altruistic when it comes to drug research.
John Moehrke (Aug 12 2019 at 13:51):
there are EHR vendor provided solutions. I know in a historic case I was involved in, these kinds of cohort probes were carefully crafted so as to provide useful information to the clinical trial, while not exposing Privacy. Once accepted, there were other privacy and safety procedures to notify the potential participants to get them engaged with the trial. All of this is not easy, especially to do it right. We hear publicly about when it is NOT done right, we do not hear about all the cases where it was done right. Again, doing it right (privacy and safety) is hard, and the work needed to do it right is valuable to everyone.
Last updated: Apr 12 2022 at 19:14 UTC