Stream: fhir/documents
Topic: From paper-based sources to FHIR Document
Lin Zhang (Nov 29 2020 at 14:06):
Such a use case is so common in our work and it would remain in a pretty long term. Therefore, there is a strong demand for ingetion and processing of data captured by OCR from scan images/photos. Any feasible approach/tooling for such a type of data sources? Thanks.
Lloyd McKenzie (Nov 29 2020 at 15:19):
Parsing an arbitrary text document and turning it into a FHIR document complete with sections and discrete data would be a hugely impressive technical feat. The first (and more common) step would just be a DocumentReference with multiple content
repetitions - one for the original scanned image and one for the OCR'd text. You might have a third that tags the data to make some of the statements computable. Going beyond that to turn the tags into valid populated resources with discrete data is not something I've seen anyone do - and that would really be the only reason to go beyond DocumentReference.
Lin Zhang (Nov 29 2020 at 23:12):
@Lloyd McKenzie Appreciate your helpful guidance.:blush:
John Moehrke (Nov 30 2020 at 13:57):
I would state that even the ability to scan paper and properly fill out the DocumentReference (metadata about the document) would be helpful. What patient? When published? Who authored? where was it authored? what timeframe? what is the clinical purpose of the document? etc... the DocumentReference is made up of useful metadata for finding any kind of document . Finding is sometimes the most important.
Lin Zhang (Nov 30 2020 at 14:29):
@John Moehrke Agree. But many metadata are hard to find for such paper-based documents, e.g., when published and who authored as you pointed out.
John Moehrke (Nov 30 2020 at 14:42):
yes, that is where OCR comes in... I have heard of some systems that do just this. All they OCR is enough to get these metadata (or as much as they can determine in first page OCR).
René Spronk (Nov 30 2020 at 14:46):
@Simone Heckmann
Last updated: Apr 12 2022 at 19:14 UTC