Stream: data extraction services
Topic: UIMA
Luca Toldo (Jan 07 2019 at 09:50):
Dear All,
thanks to @Simone Heckmann I became aware of this stream just now. Unfortunately I was not able to participate face to face at the kick off BOF.
In this stream I would like to share / discuss on the topic of how / if integrate the well established Unstructured Information Management Architecture for data extration services.
While commercial application with clinical application of NLP with UIMA technology already emerged in 2006 (e.g. Medical Entity Relation Skill Cartridge, http://www.econtentmag.com/Articles/News/News-Item/TEMIS-Launches-Luxid-for-Life-Sciences-18415.htm) , of course the work of Stephen Wu (2013) https://doi.org/10.1186/2041-1480-4-1 has to be remembered as foundation for the cTakes type system - perhaps the first public domain UIMA implementation of clinical information extraction (http://ctakes.apache.org/).
In Hochheiser et al (2016) - https://doi.org/10.1186/s12911-016-0358-4 the use of FHIR to represent computable cancer phenotypes extracted from unstructured documents is demonstrated - and as well the vision of combining this within the UIMA pipelines exploited by the DeepPhe project (https://github.com/DeepPhe/DeepPhe-Release/tree/master/deepphe-fhir). @Sean Finan
There, they extended the work of Tseytlin et al (2015) - https://doi.org/10.1186/s12859-015-0871-y that extracted clinical information and made this available both through GATE and UIMA pipelines (see implementation in https://github.com/dbmi-pitt/nobletools/tree/master/plugin-uima) .
The recent work of Dietrich et al (2018) https://doi.org/10.3414/ME17-02-0010 shows the use of UIMA in clinical information extraction in german language.
The very recent paper of Hong et al (2018) "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961797/ demonstrates the feasibility of a FHIR based type system for structured and unstructured EHR data. In their work, the interoperability of FHIR and UIMA shows that "few UIMA reserved words such as begin, end and start" conflict with the FHIR reserved tokens, and they show how they proposed to resolve the conflict (renaming from {field} to "fhir{field}").
The above is not to be considered as extensive literature research: certainly many more papers and Authors are active in the field, therefore please accept my apologyze for incompleteness.
The approach shown by Hong et al, and also by Dietrich et al., namely of using the UIMA CAS formalism combined with the FHIR semantics, allows to reuse all tools available in UIMA, as well as many annotation engines and resources. This is particularly true for the life science and bioinformatics activities in biomedical NLP that very much flourished with UIMA (and GATE).
Last updated: Apr 12 2022 at 19:14 UTC