A Study on Enhancing Low-Resource Turkish-English Neural Machine Translation Using Part of Speech Tags


Yazar B. K., Kılıç E.

The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, Warszawa, Polonya, 25 - 27 Eylül 2024, ss.70-86, (Tam Metin Bildiri)

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1007/978-3-031-92552-8_6
  • Basıldığı Şehir: Warszawa
  • Basıldığı Ülke: Polonya
  • Sayfa Sayıları: ss.70-86
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

Machine translation is a concept that computers could automatically translate one language into another. Since the translation is carried out between two languages, the success rate is usually closely related to the size of the bilingual data that is used. In this work, neural machine translation was investigated for the low-resource language pair Turkish-English. The aim is to use grammatical features in addition to sentences in the translation process. In this direction, some of the language tags of Turkish sentences are added to the translation system in order to increase the translation success. In this, Zemberek, BERTTurk, multilingual-BERT, RoBERTa and DistilBERT models are used for part of speech tagging. In the translation model, Transformer architecture was used in a multi-featured form. When the results are analyzed, the model created with the Zemberek POS tagging and the standard model without any POS tags are BLEU: 25.13 - 24.15, ChrF: 52.41 - 51.87 and METEOR: 58.22 - 57.45. It was observed that the part of speech tags obtained with Zemberek improved the success of translation. Analyzing the results of the translation models created with the part of speech tags obtained from the BERT-based models, it can be seen that they do not contribute to the translation success.