FHIR Chat · Differentiating 'scripted' from non-scripted text · implementers

Stream: implementers

Topic: Differentiating 'scripted' from non-scripted text


view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 16:09):

Some languages (e.g. Arabic, Hebrew, etc.) can express their words using the western alphabet. They can also be expressed in their native script. E.g. "sajal almarid" vs. "سجل المريض". Is there a way to flag a particular Patient.name as one form vs. the other? I know we have ISO extensions for ideographic vs. syllabic vs. alphabetic (http://hl7.org/fhir/valueset-name-v3-representation.html). However, this isn't ideographic - we don't have symbols that reflect a word or concept. Would the scripted version be alphabetic and the westernized version be syllabic? Do we need a new code for "script"?

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 16:09):

@Paul Knapp, is this something you've encountered?

view this post on Zulip Michel Rutten (Aug 29 2019 at 16:17):

Interesting... Aren't Western, Arabic and Hebrew just different forms of alphabets? I find the term "script" somewhat confusing.

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 16:53):

They are different alphabets. But there's also a roman alphabet representation of the name (which is used in many places where there are limitations on character types). The question is how to differentiate the representations.

view this post on Zulip Paul Knapp (Aug 29 2019 at 18:19):

I'd asked for this to be explicit years ago. You differentiate by character set, so if the name is expressed using the Arabic character set it is assumed to be in Arabic, and if in the roman character set then in English. There is no way to know if the two forms of text are phonetic (the case in Arabic - which is what leads to multiple spellings in English for the same work in Arabic for both people and street names) or semantic (rose in the original language and 'Rose' in English) equivalents.

From a usage perspective you simply display or print the characters - and machine translation between phonetic forms is highly inaccurate and would likely need homophone or related encoding to support lookups.

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 18:22):

In FHIR, the character set is always UTF

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 18:25):

Are you saying to just check the UTF range? Should we just leverage the existing ABC (alphabetic) code for the Arabic script and the SYL (syllabic) code for the English, or do we need some sort of new code?

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 18:26):

The reason for computable differentiation is the same as for the original use of these codes in Asia - different formats are appropriate for display/printing/exchange in different contexts

view this post on Zulip Grahame Grieve (Aug 29 2019 at 21:05):

http://hl7.org/fhir/languages.html##names

view this post on Zulip Lloyd McKenzie (Aug 29 2019 at 22:14):

Perfect. Thank you!


Last updated: Apr 12 2022 at 19:14 UTC