FHIR Chat · Representing translations · implementers

Stream: implementers

Topic: Representing translations


view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:15):

One of the things on my roadmap for R5 was to see if we could come up with an improved representation for translations.

At present, we handle translations using the translation extension, which looks like this:

{
  "resourceType" : "StructureDefinition",
  "description" : "this is some text in the primary language",
  "_description" : {
    "extension" : [{
      "extension" : [{
        "url" : "lang",
        "valueCode" : "nl-NL"
      }, {
        "url" : "content",
        "valueString" : "The same text in dutch"
      }],
      "url" : "http://hl7.org/fhir/StructureDefinition/translation"
    }]
  }
}

And in XML:

<StructureDefinition xmlns="http://hl7.org/fhir">
  <description value="this is some text in the primary language">
   <extension url="http://hl7.org/fhir/StructureDefinition/translation" >
     <extension url="lang">
       <valueCode value="nl-NL"/>
     </extension>
     <extension url="content">
       <valueString> value="The same text in dutch"/>
     </extension>
   </extension>
  </description>
</StructureDefinition>

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:16):

It's almost like we set out to make the wire format as obtuse as we could

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:18):

so what if we said 'the translation extension is actually special, and we should do something special with it - as convenient as possible'.

the densest representation I could imagine would be something like this:

{
  "resourceType" : "StructureDefinition",
  "description" : "this is some text in the primary language",
  "description:nl-NL" : "The same text in dutch"
}

And in xml:

<StructureDefinition xmlns="http://hl7.org/fhir">
  <description value="this is some text in the primary language" value:nl-NL="The same text in dutch"/>
</StructureDefinition>

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:19):

that's pretty simple, but as much as I like that, there's no way to define that in schema, and plenty of implementers work using schema. So that would lead to something like this:

{
  "resourceType" : "StructureDefinition",
  "description" : "this is some text in the primary language",
  "description-translations" : [{
    "lang": "nl-NL",
    "text": "The same text in dutch"
  }]
}

and this

<StructureDefinition xmlns="http://hl7.org/fhir">
  <description value="this is some text in the primary language">
    <translation xml:lang="nl-NL" value="The same text in dutch"/>
  </description>
</StructureDefinition>

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:22):

honestly, I think that that json is not as convenient as this:

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:22):

{
  "resourceType" : "StructureDefinition",
  "description" : "this is some text in the primary language",
  "description-translations" : {
    "nl-NL" : "The same text in dutch"
  }]
}

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:22):

but I can't see how to do that with json schema. If anyone can, I'm all ears.

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:23):

note that all this is predicated on the idea that this is a special representation for the extension, and the standard extension way of representing is still valid in the general case.

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:24):

I talked to a few European implementers about this while in Sydney, and they believed that we should have a wider discussion about this

view this post on Zulip Grahame Grieve (Feb 09 2020 at 09:25):

so opinions please

view this post on Zulip Rob Hausam (Feb 09 2020 at 09:39):

What about making it core in the string, code and markdown data types?

view this post on Zulip Lloyd McKenzie (Feb 09 2020 at 16:02):

Can you expand on that Rob? What would it look like?

view this post on Zulip Rob Hausam (Feb 11 2020 at 08:15):

That's a good question. As I look further at what I suggested, unfortunately I don't see any way to do it, since the string, code and markdown data types which are the context of the translation extension are all primitive themselves. I hesitate to even mention it (because it would have huge ramifications and is almost certainly impossible), but the only way that I can think of that it could be done would be to change these to complex types instead of primitive, and include a 0..* 'translation' element. So, there goes that idea. Maybe there's something else, but if so I haven't thought of it yet.

view this post on Zulip Rob Hausam (Feb 11 2020 at 08:17):

If we are able to use one of Grahame's extension simplifications, I think that would be helpful.

view this post on Zulip Josh Mandel (Feb 12 2020 at 02:54):

In the JSON-LD world, there's a compact syntax using "langauge maps"

  "occupation":  {
    "ja": "忍者",
    "en": "Ninja",
    "cs": "Nindža"
  }

in addition to an expanded syntax, which uses repeated nested objects.

view this post on Zulip Grahame Grieve (Feb 12 2020 at 03:30):

we could do something like that, if only we decided we didn't care about json-schema

view this post on Zulip Rob Hausam (Feb 12 2020 at 05:18):

yes, that looks pretty nice - but I assume we do caree about json-schema and that precludes it?

view this post on Zulip Grahame Grieve (Feb 12 2020 at 05:21):

well, json-schema doesn't work too well, but we get a steady stream of implementers asking about it

view this post on Zulip Grahame Grieve (Feb 12 2020 at 05:22):

@Harold Solbrig I'm interested if you can ask the json-ld community about the language maps feature above, and the apparent reality that json-schema can't describe it, and whether that's been a problem for anyone?

view this post on Zulip Harold Solbrig (Mar 04 2020 at 22:07):

No need to ask -- it is built into JSON-LD 1.1. See: https://w3c.github.io/json-ld-syntax/#string-internationalization

view this post on Zulip Grahame Grieve (Mar 05 2020 at 06:32):

I really wish we had not folded and done JSON primitives the way we do. It would give us so many more options. But @Harold Solbrig that missed my actual question: json schema can't describe this. has it ever been a problem in the json-ld community?

view this post on Zulip Pétur Valdimarsson (Mar 10 2020 at 22:54):

I've spent a bit of time contemplating translations of IG:s for the last couple of months and am putting thoughts out here for anyone to read and/or ridicule. My main focus has been on producing the same IG in two different languages, english being one.

When it comes to localisation of a profile or an IG there are 5 areas to handle;

  1. Text in "html template" can be handled fairly easily trough jekyll, not really an issue.
  2. Snippets produced by the publisher (Name, Flags, Card. Type...etc.). Seems the boilerplate for this is in place, just needs a tad of code that looks up translations.
  3. CodeSystems - Designations covers the needs just fine.
  4. Texts created by author of new profiles; Definition, short, note.
  5. Inherited texts from the FHIR specification. Now this one is a hassle. To manage this one either has to modify each and every element inherited or start by creating a full translation of the FHIR standard.

This thread mainly covers points 3 and to some extent 4 so I'll focus on that part.
I believe that whatever approach is taken should take into account a couple of issues; possibility of "outsourcing" the actual translation work, enabling translation to inherited content, avoiding repetition to whatever amount is possible and keeping the translations a part of the new IG.

By the outsourcing part I mean that the author of the profile might not be the person doing the translation. Translations will possibly be done by local translators and domain specialists. Handing over a set of profiles and letting them dig through/editing the snapshots might not be efficient and or doable at all.

Handling inherited content means that translation data needs to be supplied from "outside" the inherited profile. The avoidance of repetitions also points to this.

The way we have been handling this is a three part process. We wrote a rather simple nodejs application that parses through the profiles (including the snapshot), extracts a distinct list of the contents of relevant elements and writing to an external csv file containing three columns; string found, a list of paths for each occurrence containing which profile and which element, and an empty column for translation. This has then be sent to local specialists who add translation to the empty column. When the file is sent back we then do a replacement in the same way we did the extraction, copying the results to a new input folder.
All in all the process has a few moving parts, but does the job pretty well. The translators get a concise list of terms to translate, we can discover new texts needing translation by diffing the output csv files from part one while changes inherited content are discovered through the diff since we use the contents as keys. (I'll try to make the translation application available and add some documentation for the workflow once it has matured a tad)

To sum it up; Given free hands I myself would be looking at adding a list of translation maps keyed by language to the root of a SD instead of adding them on the content level. Would make extraction, translation and reinsertion of data a lot simpler and allow for adding translations for inherited data.

view this post on Zulip Grahame Grieve (Mar 11 2020 at 02:00):

non-english and multi-language IGs is on the todo list for this year. We have plans to place to finish what we have started, to leverage the translation element, and to make it easier to produce multi-language IGs

We discussed the format of the translation files, and pretty much thought the way you do about it, but didn't choose an actual format

view this post on Zulip Grahame Grieve (Mar 11 2020 at 02:01):

the one issue we hadn't really discussed is translating inherited content. That's a huge amount of that, and we wouldnt want to 'to having any fo that done at a per-IG level when it could be done once per-relam

view this post on Zulip Jose Costa Teixeira (Mar 11 2020 at 05:08):

does this mean that we (HL7) would have somethig like a java resourcebundle where we can ask for translations?

view this post on Zulip Grahame Grieve (Mar 11 2020 at 05:09):

we're half through making that the case now - probably the next release of the validator will support german

view this post on Zulip Grahame Grieve (Mar 11 2020 at 05:10):

see https://github.com/hapifhir/org.hl7.fhir.core/tree/master/org.hl7.fhir.validation/src/main/resources

view this post on Zulip Grahame Grieve (Mar 11 2020 at 05:10):

anyone can add other languages

view this post on Zulip Grahame Grieve (Mar 11 2020 at 05:10):

the publisher will be next - but this is not the same thing as you were asking about

view this post on Zulip Pétur Valdimarsson (Mar 11 2020 at 12:04):

@Grahame Grieve Sounds resonable.
For the work forward, it is nice keep in mind that inherited content can come from more sources than just hl7. It's easy to end up in a situation where a dependency on us-core, sdc or something else might mean having to translate a whole third-party IG, instead of only relevant parts.

view this post on Zulip Grahame Grieve (Mar 11 2020 at 12:56):

I'll keep that in mind

view this post on Zulip Giorgio Cangioli (Mar 11 2020 at 13:19):

Grahame Grieve said:

anyone can add other languages

is this task that affiliates could start (if they wish) or it is too early ?
(it is better to wait for future more stable versions)

view this post on Zulip Lloyd McKenzie (Mar 11 2020 at 14:28):

The content that's normative is relatively stable, though we'll be tweaking/improving language for clarity for years.

view this post on Zulip Patrick Werner (Mar 11 2020 at 15:08):

i think what @Giorgio Cangioli asked is when everyone can contribute translations to the validator.
The state here: https://github.com/hapifhir/org.hl7.fhir.core/tree/master/org.hl7.fhir.validation/src/main/resources is stable, but there are more properties/terms to come (https://github.com/hapifhir/org.hl7.fhir.core/pull/153).

view this post on Zulip Patrick Werner (Mar 11 2020 at 15:09):

I would wait for after the Merge of the PR. Even after that there probably will be some refactoring, joining different redundant terms together.

view this post on Zulip Giorgio Cangioli (Mar 11 2020 at 15:16):

Thank you @Patrick Werner

view this post on Zulip Tim Berezny (Sep 28 2020 at 23:36):

I would love to see a simpler way to represent language translations as suggested in this thread. The extension approach makes the payload quite clunky.

I'm working on HealthcareService, The other approach we're considering is making a different resource for the different translations ... we don't love that idea, but with some fancy identifier footwork we might be able to make it work.

Is adding an extension to every element the generally preferred way to handle translating this type of content? (e.g., service name, description, etc...).

view this post on Zulip Grahame Grieve (Sep 28 2020 at 23:40):

yes. though there's not generally that much purely textual content in a resource.

view this post on Zulip Grahame Grieve (Sep 28 2020 at 23:43):

more generally.... I think that there wasn't that much interest in this approach. If you wanted to push it along, you could join the json schema google group and ask about the json-ld language maps feature above. I'd really like to have something like that, but I can't imagine how to describe it in json-schema and I think that's a killer for us

view this post on Zulip Jose Costa Teixeira (Oct 19 2020 at 06:37):

@Elliot Silver this thread

view this post on Zulip Douglas DeShazo (Apr 12 2021 at 19:29):

Hope this is the correct thread. Question - Is it allowed, and does it make any sense to add translation extension to narrative? I'm not sure the spec is completely clear on this. Thanks in advance.

view this post on Zulip René Spronk (Apr 13 2021 at 06:42):

Certainly, that makes sense. See http://build.fhir.org/languages.html##lang for specific guidance on using multiple languages in a narrative block.

view this post on Zulip René Spronk (Apr 13 2021 at 06:59):

To go back to the issue raised at the start of this thread: translations (on any narrative) could be done in a more elegant/shorter way, but on the other hand the 'pain' associated with the normal extension mechanism doesn't seem large enough for the international audience for them to request/demand a specific/optimized solution to deal with translations. I'm guessing the number of solutions that support multiple languages is still relatively low.

Time to speak up if you have implemented a multi lingual FHIR API, and if you'd like to see a better solution for the support of multiple languages in a FHIR resource.

view this post on Zulip Douglas DeShazo (Apr 13 2021 at 12:25):

Thanks @René Spronk Where we're having difficulty is whether the translation extension itself should be in the narrative or whether using div is the acceptable way:
"text": {
"status": "generated",
"div": "<div xmlns=http://www.w3.org/1999/xhtml>
<p><b>Encounter</b></p>
<p><b>Patient</b>: SMART, NANCY</p>
<p><b>Location</b>: Model Hospital, MX Hospital, NU05, 102, A</p>
</div>"
"div": "<div lang="sv" xmlns=http://www.w3.org/1999/xhtml>
<p><b>Vårdkontakt</b></p>
<p><b>Patient</b>: SMART, NANCY</p>
<p><b>Plats</b>: Model Hospital, MX Hospital, NU05, 102, A</p>
</div>"
},
Here, it's just div and the expected translation representation is not present like we might see within the body of the resource.

view this post on Zulip René Spronk (Apr 13 2021 at 14:27):

In narrative, this seems to be the recommended way: multiple divs, no usage of extensions.

view this post on Zulip Tilo Christ (Apr 13 2021 at 22:38):

René Spronk said:

To go back to the issue raised at the start of this thread: translations (on any narrative) could be done in a more elegant/shorter way, but on the other hand the 'pain' associated with the normal extension mechanism doesn't seem large enough for the international audience for them to request/demand a specific/optimized solution to deal with translations. I'm guessing the number of solutions that support multiple languages is still relatively low.

Time to speak up if you have implemented a multi lingual FHIR API, and if you'd like to see a better solution for the support of multiple languages in a FHIR resource.

As an implementer of international patient questionnaires for me the reason why I am not demanding a different solution is because the pain of simply duplicating a short questionnaire is relatively low. I.e. do it once in English, create a copy, translate it into German/Spanish/Klingon, etc. by just replacing the strings in the texts, answerOptions, etc. without any use of the extensions. That is simple enough that even a commercial translation service can do it. Then place all copies onto a web server as static files and let the Accept-Languages header do the rest. Anything more fancy adds extra complexity to the filler and the translation process.

view this post on Zulip Lloyd McKenzie (Apr 16 2021 at 17:07):

There is no mechanism to put extensions within a div, which is why we have a different approach there.


Last updated: Apr 12 2022 at 19:14 UTC