English

Machine Translation at a Glance

Our FAQ on the topic of Machine Translation.

TRANSLATE
Machine Translation at a Glance

In the age of digitalisation, Translation also benefits from AI (Artificial Intelligence). By using trainable or generic engines, texts and technical documents can be translated more efficiently.

If you're are also involved in Machine Translation and would like to benefit from it, we've compiled the most important points on the topic of "Machine Translation", in the form of an FAQ.

 

1. Introduction, General Information

What is Machine Translation (MT)?

Machine Translation (MT) means the automatic translation of texts through the application of specific computer programs. Since about 2016/2017, Neural MT (NMT) has been mainly used, in which artificial neural networks predict the probability of word sequences or sentences within translations. These so-called MT engines link bilingual data, using Deep Learning methods and can therefore "learn" complex linguistic features completely autonomously.

 

What are Generic MT Engines?

Generic MT engines are trained using publicly available data from a wide variety of domains. Well-known generic translation services include GoogleTranslate, DeepL, and Amazon Translate. The wealth of data that goes into training these neural engines makes these translations sound very natural. However, this also makes it more difficult to detect machine translation errors without a close examination of the texts.

Due to the extensive training involved, generic engines can serve a wide variety of subject areas. However, whether subject-specific content is also translated correctly and consistently (depending on the subject area) is not guaranteed, due to the unpredictable amount of data and varying algorithms that control the engines.

 

What are Trainable MT Engines?

Trainable MT engines (also called individualised engines) are trained with customer-specific data. This data comes from existing translation memories and bilingual or multilingual terminology databases. Individual training improves the translation of specialist and customer-specific terms, and consistency can be ensured for recurring texts in the same way as an exact match or context match. In addition, with ongoing training and tweaking, Machine Translation errors can be corrected and eliminated in a more targeted manner. The more the individualised engine is trained, the higher the quality of the machine translation.

 

How much time and money can I save with Machine Translation and what other benefits does MT offer me?

MT-related cost savings in translation projects vary greatly, depending on the dataset and the source texts. On average, the savings are currently around 20% after the MT process has been successfully established. In general, larger savings are seen with large volumes of new texts, while smaller revisions or updates tend to result in less significant savings from MT.

Depending on the initial situation (e.g. quality of training data, terminology), the processing time of translations can be reduced by up to 50% by using MT.

A well-trained engine makes it possible to make optimal use of the inventory of translation data (e.g. Translation Memories and Glossaries) and to continuously reduce both translation time and costs.

 

Is the use of translators still necessary at all with Machine Translation?

Yes, basically because the machine lacks some capabilities that human translators take for granted. For example, machine translation systems can only ever process segments independently of each other and can't put them together into context. This becomes a problem especially when there are lexical or structural ambiguities in the source text. Since machine translation is therefore not infallible, the machine translation results must always be considered raw data that must subsequently be post-edited.

 

What is Post-Editing?

Post-editing is the editing and correction of machine-translated texts. In this process, the post-editor checks the translations suggested to him or her by the system and adjusts them in terms of grammar, spelling and style, as well as according to customer-specific specifications such as specialist terminology. Post-editors should also be familiar with the linguistic characteristics of machine translation, because while spelling and carelessness errors are rare, machine translations often have syntactical inconsistencies, terminology or tagging errors.

 

2. Prerequisites, Conditions

What factors influence the quality of machine translation results?

The quality of machine translation results depends upon several factors. These include:

  • Quality of the source text
  • Text type
  • Language directions (i.e. source-target combinations)
  • Number and quality of training data used to train the engine

 

How can I improve and positively influence my Machine Translation results?

Since the quality of the MT results depends, among other things, upon the quality of the source text, it makes sense to work in accordance with certain guidelines already in the technical editorial creation process. These guidelines can be found in editorial guidelines or guidelines on translation-oriented writing. Examples include the use of simple, short sentences or the avoidance of ambiguities and inconsistencies.

 

For which text types is Machine Translation well suited?

Technical texts or linguistically regulated texts, such as those found in the field of technical documentation, are best suited for the use of MT. These texts are ideally written in a consistent and comprehensible manner. However, product descriptions, catalogues, internal documentation, learning platforms or knowledge databases are also suitable.

 

For which jobs is the use of Machine Translation less suitable?

  • Texts with many tags* (e.g. HTML):
    The engine is not always able to reinsert tags at the correct place in the target text; it sometimes omits tags or inserts additional tags. Especially for texts with many tags, this increases the post-editing effort so much that the cost/benefit ratio of MT can no longer justified.

  • Orders with a low translation volume:
    For jobs that can be handled by translators in a short time by "conventional means", it should be considered whether the use (as well as the initial effort) of MT is worthwhile.

  • Low-context texts:
    MT systems can only ever process segments independently and can't contextualise them together. This becomes a problem, especially when lexical or structural ambiguities are present in the source text.
  • Texts with frequently used special formulations:
    These include non-common abbreviations, proper names, and product names.

  • Creative texts:

    Since MT can't infer the meaning of a text, but only translates "the surface", texts with metaphors, idiomatic expressions, or wordplay tend to be less suitable for MT use. These include literary texts, Marketing texts, or slogans.

* Markup elements that define formatting, for example.

 

Which languages are suitable?

For example, MT achieves good results with Romance languages (e.g. Spanish, Italian, French) and Germanic languages (e.g. German, Dutch). In addition, Machine Translation is more suitable for language pairs that are similar to each other, such as German and English, than for language pairs such as Russian and Chinese.

 

 

3. Use of Machine Translation and Processes

How is Machine Translation used at kothes?

Machine Translation is used in the translation process via an interface (API) to the Translation Management System. This interface makes it possible to automatically translate texts that are not yet available in the system's customer-specific Translation Memory (TM). Text modules from previous translations can thus continue to be reused. Which content is taken from the TM and which from the machine can be set individually. All downstream activities in the translation process, such as automatic quality checks and layout adjustments, continue to run as usual.

 

How can I integrate Machine Translation into my processes?

The MT engine can be used via plug-in in all kothes translation systems. It's also possible to connect the Machine Translation engine to other customer systems (CRM, CDP, etc.) via an interface (API). In these cases, kothes offers both the hosting and training of the MT engine and makes it available to customers via an API.

 

Once I've decided to use Machine Translation, do I have to translate this way forever?

No. The use of Machine Translation can also be eliminated from the translation process at a later date. The TM is then used as usual for the human translation process and contains the translation segments saved from post-editing up to that point.

However, it should be emphasised that the switch to Machine Translation is a strategic decision. Therefore, we recommend keeping the new MT process for the selected language pairs at least during the initial period. This way, the MT engine can also be continuously trained to maintain (or even improve) the quality of the machine translation results.

 

What is the benefit of my terminology when using Machine Translation?

Trainable MT engines are trained with customer-specific data to include both terminology and corporate identity-specific language in the translations, thereby increasing the quality of the pre-translation. Ultimately, this can minimise the amount of post-editing required.

There are two important factors that influence MT here:

  1. Well-maintained terminology in the source and target languages increases the quality of Translation Memories.
  2. When training an engine, this terminology can be prioritised for machine pre-translation.

 

How is Quality Assurance ensured in Machine Translation?

Post-editing is followed by an automatic check, as in the regular translation process, and – if desired – a 4-eye examination, because post-editing is not a substitute for a check by another person. The posted and, if necessary, checked text modules are then stored in the Translation Memory and can be reused for the next translation job. In this way, we as a Language Service Provider (LSP) can continue to guarantee high quality for all translations.

 

4. Hard Facts

How much data do I need to train an MT engine?

Large amounts of translation data are required for training individualised machine translation engines. Therefore, at least 15,000, but ideally 100,000 segments should be available in the TM. The use of trainable engines is therefore more suitable for companies with high translation volumes. The more high-quality data is available, the higher the probability that the trainable MT engine will deliver good translation results.

In the event that the client does not yet have an extensive database, it's possible to enrich it additionally with thematic-general translation data, in order to obtain sufficient data for training the MT engine. This enables a broader range of customers to get started with Machine Translation.

 

What are the initial costs of using Machine Translation?

The initial cost of MT depends strongly on the framework conditions (i.e. source data), number of engines and type of hosting. We'd be happy to check your requirements in a Machine Translation Check, please get in touch for further details.

 

What are the various hosting options?

MT translation systems can usually be cloud-based and hosted in-house. In-house hosting requires a significant investment in hardware, which is why we recommend a cloud-based system for starters.

 

Will my data be secure?

While with trainable engines the data is only used for training the corresponding engine, there is scepticism about how providers of generic systems process this data. When using freely-available generic engines, it's still unclear what happens to the data. However, corporate versions of generic engines are also offered, where data protection is ensured by means of end-to-end encryption and immediate deletion of the data after its translation. In any case, we advise against having sensitive data machine-translated using browser-based translation tools. 

 

Which standards evaluate the core processes in the area of Machine Translation and ensure implementation?

All processes, tasks and requirements relating to machine translation and post-editing are described in ISO 18587 (Translation services — Post-editing of machine translation output — Requirements). In all other areas, the processes continue as before in accordance with ISO 17100 (Translation services — Requirements for translation services).

 

We would be happy to answer any additional questions you may have about "Machine Translation" during a personal meeting.

 

Katrin Grzimek
Author:
Blog post Katrin Grzimek