Machine Translation – Part 3

Post-editing under the microscope.

Machine Translation – Part 3

In the first part of our three-part blog series on "Machine Translation", we discussed the preparation of a machine translation process and the factors that influence machine translation results. In the second part, we introduced the features and characteristics of both generic and trainable engines. In this last part, we'll explain the necessity of post-editing machine translations with the help of some examples, explaining the aspects that need to be taken into account.  

"To err is machine"

Various factors, such as the type and quality of the source text, source-target language directions, as well as the choice of engine, affect machine translation results differently. However, these factors are not the only reasons for inaccurate translations, because fundamentally, the machine lacks some skills that human translators take for granted. For example, machine translation engines can only ever process segments independently and can't contextualise them together. This becomes a problem, especially when there are lexical or structural ambiguities in the source text. In the German example sentence "Die Türen schließen." the machine can't tell whether it's an action instruction in the imperative infinitive or is just the result of an action instruction. Therefore, there are two possible translations – for example, in English, "Close the doors." and "The doors close." – between which the machine must choose (without knowing the context). A human translator, on the other hand, can immediately infer from the context of the rest of the text which translation is the correct one. Another example: the German chapter heading from an operating manual "Reinigen und warten". Here, it's quite possible that the machine translates the German "warten" as the English "wait" instead of "maintenance", due to the lack of context.

Furthermore, machine translation systems have difficulties in generating pragmatic translations (i.e. translations that don't literally reflect the source text). The following example illustrates this:

German Source Text:       Führen Sie die Testvorbereitung gemäß dem Leitfaden durch.
Machine Translation:       Perform the test preparation in accordance with the guideline.
Human Translator:           Prepare the test in accordance with the guideline.

While the machine translation in this example is not incorrect per se, a human translator would be able to translate this sentence meaningfully. Since the word "prepare" is not found in the source text, the machine translation engine is not able to produce such a pragmatic (and useable) translation.

Post-editing as a necessity

Knowing that Machine Translation is not infallible, the machine translation results are always to be understood as raw data that must subsequently be edited and corrected – or more precisely, "post-edited". There are two types of post-editing: "light post-editing" and "full post-editing". In both cases, a qualified translator checks the machine translation result, in relation to the source text.
In light post-editing, there is no requirement to achieve a translation that's comparable in quality to a human translation. The result should be understandable and correct, but doesn't have to be stylistically appealing. In accordance with ISO 18587:2017 "Translation services – Post-editing of machine-produced translations – Requirements", the adaptation of technical or customer-specific terminology is not the subject of light post-editing. This form of post-editing is primarily used to capture the content of a text and is therefore more suitable for internal purposes or short-lived texts that are not intended to be published.
With full post-editing, on the other hand, the translation result should be equivalent to a human translation. Here, in addition to content adjustments and simple, understandable wording, the post editor must also adjust their grammar and style, if necessary, according to customer requirements.

Although post-editing overlaps with traditional translation and proofreading, it requires additional skills from post-editors. Unlike translation, post-editing is intended to improve upon the quality of machine translation results by correcting errors. Deleting translation suggestions in order to insert a new translation, as well as excessive adjustments (over-editing) are the wrong approach here. In addition, post editors should be familiar with the linguistic characteristics of machine translation: While spelling and cursory errors are rare, machine translations often have syntactical inconsistencies as well as terminology errors that would not happen to even the most inexperienced translator.

In the first part of our blog series, we mentioned that the throughput of post-editing can be significantly higher, resulting in the desired time and cost savings. However, actual time savings vary widely in reality; according to some vendors, savings are up to 50% higher than with traditional translation, while in various research projects, sometimes no significant productivity gains were seen at all. In some cases, translators have even reported that their productivity has dropped. The rule of thumb here is: the higher the quality of the machine translation is, the lower the effort required for post-editing.

Linguistic regulation provides relief

Although some errors from machine translations are inherent to the engines themselves, it's possible to actively contribute toward influencing the quality of machine translations. In addition to the choice of engine, both the type and quality of the source texts play a critical role. The most common errors, which include terminology errors and syntactic translations (word-for-word or literal translations), are directly related to ambiguities and/or out-of-context terms and phrases. In addition, it can be observed that machine translation engines translate simple and understandable sentences correctly more often than long, convoluted sentences with unclear references. At this point, machine translation results can be significantly improved by regulating and tweaking source texts, as is the case in technical documentation via editorial style guidelines, for example. 

In order to minimise the effort involved in post-editing machine translations, it's therefore advisable to check the quality of the source material in advance – this includes the source text itself, but also existing translation memories and terminology lists – and to adjust them, if necessary. As a service provider for information products, we focus on a holistic view of the information process, starting with the editorial creation of the source texts and ending with the optimisation of our translation processes. We would be happy to review your documents and then consult with you, in order to determine which solution is best for you.

Our conclusion

Machine translation has become an integral part of the translation industry and, together with post-editing, has already been integrated into existing translation processes. However, experiences with the use of Machine Translation vary greatly, as many factors determine the overall quality of machine translation results. We have therefore created a checklist of the four factors that should be taken into account when using MT.

  1. Choice of engine
    Generic engines are inexpensive and cover a wide range of subject areas, but are also more error-prone. Trainable engines are specifically trained for a subject area or for a customer's requirements. They can therefore translate specific content more reliably and "learn from their mistakes" through so-called re-training. However, they therefore require more effort in terms of preparing the training data and initial and subsequent training.
  2. Type of source text
    Texts with recurring elements, as well as simple sentence structures (such as operating instructions or online help) are more appropriate than creative or literary texts.
  3. Quality of source material
    The machine is not able to correct errors in the source text. In addition, contextual references can't be established. The longer and more convoluted a sentence is, the higher the probability that the engine will not be able to recreate the references correctly. Incorporating guides into the editorial process not only makes it possible to produce consistent and comprehensible source texts, but also favours machine translations – precisely because of this.
    In addition, existing translation resources, such as the translation memory and terminology databases, should be well maintained – both for training the machine translation engine and for quality assurance reasons.
  4. Language pairs
    For languages that are similar in grammatical structure (e. g. German and English), machine translation engines usually achieve better results than for languages that differ greatly in structure (e.g. Russian and Japanese).

Preparing for a machine translation process may seem time-consuming. However, with the right preparation, both money and time can be saved. It's therefore worth taking a holistic view of machine translation and laying the foundations for a successful machine translation process as early as the creation of source texts, with structural preparatory work in accordance with editing and stylistic guidelines and terminology management. In addition, machine translation engines are constantly evolving to provide more and more accurate translations. Although they'll not replace humans in the near future, we can make efficient use of their strengths.

Katrin Grzimek
Blog post Katrin Grzimek