Stream: data extraction services
Topic: Scope of this stream
Simone Heckmann (Nov 15 2018 at 12:08):
Hello all,
this is the stream for discussing all topics related to data extraction services, such as text analysis, NLP, image recognition etc.
The topics discussed should not touch on the actual algorithms used but rather on how to represent the results from these analysis in FHIR.
Some talking points for further discussion:
- how do we express confidentiality in the results?
- how do we reference specifics elements in resurces that have been created
- how do we capture character or byte ranges to be able to trace back from what parts of the source data?
- can we use this range to highlight parts of the source data for human inspection?
- how can we capture confirmation/dismissal of detected information in a review process?
- how different are the requirements for text/audio/picture analysis?
bonus question: can we use all of this information to feed training data back to improve AI algorithms?
Hello all,
this is the stream for discussing all topics related to data extraction services, such as text analysis, NLP, image recognition etc.
The topics discussed should not touch on the actual algorithms used but rather on how to represent the results from these analysis in FHIR.
Some talking points for further discussion:
- how do we express confidentiality in the results?
- how do we reference specifics elements in resurces that have been created
- how do we capture character or byte ranges to be able to trace back from what parts of the source data?
- can we use this range to highlight parts of the source data for human inspection?
- how can we capture confirmation/dismissal of detected information in a review process?
- how different are the requirements for text/audio/picture analysis?
bonus question: can we use all of this information to feed training data back to improve AI algorithms?
Simone Heckmann (Nov 15 2018 at 12:33):
Summary of the talking points from the DevDays BoF that started this stream:
- it is agreed that Provenance should be the container for any additional information about data extraction results.
- Provanance.target points at the created resource (e.g. Condition, Observation, MedicationStatement...)
- Provenance.agent gives details about the algorithm that extracted the data (Software name, Vendor, Version...)
-
Provenance.entity gives details about the source of the data (like a PDF, an audio file, a picture...)
-
we require a (complex) extension to capture confidence, character/byte range, andntarget element
- it is not yet clear how confidence can be expressed in an interoperable way, but the way it's handled in the Patient/$match operation definition might be a starting point.
- in addition to keeping the confidence on the (immutable) Provenance resource, the created resources should also have a status that reflects uncertainty (e.g. "unconfirmed") and may have an additional Metadata tag to indicate that these resources e.g. "need revision/confirmation"). If the resources are subsequently confirmed or rejected, that will update the tags/status, but not alter the Provenance.
- the range part of the extension should be suitable to be able to show/highlight the parts of the original source that led to the creation of a resource
Please create individual chat topics to further discuss any of the above mentioned issues...
Simone Heckmann (Nov 15 2018 at 12:34):
For completeness, here's the link @Josh Mandel provided of his initial write up:
https://github.com/smart-on-fhir/smart-on-fhir.github.io/wiki/Extensions-and-Provenance-for-Derived-Data is the wiki page I had started; I solicited more feedback but only have a formal write up from Ciitizen so far.
Last updated: Apr 12 2022 at 19:14 UTC