Stream: data extraction services
Topic: paragraph and sentence range extension
Luca Toldo (Feb 19 2019 at 07:50):
Hi all, along those lines, paragraph-range extension and sentence-range extension would as well be useful. Any thought about this ? Unfortunately FHIR does not have neither "Sentence" nor "Paragraph" nor "Annotation" as Resources and therefore one solution could be to have them as extensions too ... what does the community thinks about it ?
Sean Finan (Feb 20 2019 at 15:00):
We have a simple fhir writer in Apache cTAKES that encodes text mention provenance using an extension that contains urls ..../span-begin and .../span-end. Each has a valueUnsignedInt. Using a single range was avoided because it seems that range should indicate bounds for some possible value(s) within those bounds. The same is done for the start and end indexes of Sentence, Section, Paragraph and simpler tokens.
As for resources for those types, a Basic is used with code > coding. "system" is simply "type-system", indicating that the ctakes type-system is used, with a "code" of the ctakes type. E.g. "org.apache.ctakes.typesystem.type.textspan.Segment"
In essence, this is an attempt to utilize existing fhir resources with the realization that different tools may have different ways of expressing bounded types that have similar purpose. In addition, if the goal of a consumer is to reconstruct a state of the nlp information for some unknown representation, much much more than just section, sentence, paragraph and mention are required and representing with a unique resource is a bit overloading.
Last updated: Apr 12 2022 at 19:14 UTC