Skip to content
Platforma LivoLink » Blog » Jakość tłumaczenia oparta o metryki TER oraz BLEU

TER and BLEU – what these metrics tell us about translation quality

    Using computer engines and artificial intelligence for translating texts does not always produce perfect results. Is there a way to objectively assess the quality of machine translation (MT)? Two metrics, TER and BLEU, can help. Learn what they reveal about the quality of machine-generated translations.

    What is TER?

    TER (Translation Edit Rate) is a metric used to assess the quality of machine translations. It measures how many edits are needed to transform a machine-translated text into a reference version – the ideal translation.

    TER is usually expressed as a percentage. It indicates the extent of changes required in the input segment (machine translation) relative to the reference output. These changes include:

    • Inserting missing words
    • Removing unnecessary words
    • Replacing incorrect words with correct ones
    • Reordering sequences of words

    If TER shows a value of 25%, this means that one quarter of the machine-translated text would need editing to reach the reference version.

    What does the TER metric indicate?

    TER provides valuable information for translators and post-editors working with MT texts. This metric allows you to:

    • estimate how much effort a machine-translated text requires – the higher the TER, the more time needed to correct it
    • evaluate the effectiveness of a particular MT engine – analyse how well a tool actually supports translation
    • compare different MT engines – TER enables objective comparisons, helping choose the best tool for specific types of texts.

    Limitations of the TER metric

    However, it is worth remembering that TER has its limitations, which may distort the results of machine translation analysis. For example, it does not take into account the semantics of the text – it focuses solely on the mechanical changes that need to be made in the translation. For this reason, it only makes sense to use it in conjunction with other metrics, such as BLEU. This allows for a better and more accurate assessment of machine translation in relation to the reference text.

    What is BLEU?

    BLEU (Bilingual Evaluation Understudy) is another metric for evaluating machine translation quality. It analyses the similarity between a machine-translated text and one or more reference translations.

    BLEU works by analysing n-grams, which are sequences of adjacent words. It compares their occurrence in the machine-translated and reference texts. BLEU scores range from 0 to 1, where 1 indicates a perfect match with the reference translation.

    What does the BLEU metric indicate?

    BLEU provides insights especially useful for post-editing machine translations. It:

    • measures lexical similarity – shows the extent to which the MT uses the same words and phrases as the reference text. A high BLEU score suggests correct terminology,
    • evaluates fluency – analysing longer n-grams, BLEU indirectly assesses whether the MT preserves natural target-language phrasing,
    • enables comparison of MT systems – similar to TER, BLEU allows objective evaluation of different tools. For example, if Engine A scores 0.42 on a medical text and Engine B scores 0.37, Engine A’s translation is closer to the reference and likely of higher quality.

    Limitations of the BLEU metric

    Like TER, BLEU is not perfect. It does not account for synonyms – a synonym in the MT output may be marked as an error. Therefore, BLEU is usually combined with other metrics such as TER or METEOR. METEOR, unlike BLEU, recognises synonyms and provides a more flexible assessment.

    Why TER and BLEU are important in translation evaluation

    TER and BLEU together give a more complete picture of MT quality. TER shows how much post-editing work a translator will need to do, while BLEU measures similarity to the reference. Using both metrics allows an objective, multidimensional evaluation.

    For translators and translation agencies, these metrics are valuable because they:

    • help select the most suitable MT tools for specific types of texts,
    • allow monitoring of MT output quality through regular tracking of TER and BLEU values.

    However, it is worth remembering that no metric is perfect. TER and BLEU have their limitations, so in order for the quality of translation to be as satisfactory as possible, a human perspective is always needed – cultural and linguistic sensitivity and an understanding of context.

    Is using BLEU and TER necessary?

    Including TER and BLEU in translation workflows is now essential when using modern translation tools. They help understand MT strengths and weaknesses and enable more efficient use in daily work.

    If you are already using LivoLINK tools – such as LivoCAT, TM, glossaries, CRM, TMS, and automation – our next post will show how the system calculates these metrics and how this can improve your workflow.