Comparative Analysis of Named Entity Recognition Models for Russian And Uzbek

Authors

  • Djuraeva Zulkhumor Radjabovna DSc, Professor, Department of Russian language and literature, Bukhara State University, Uzbekistan

DOI:

https://doi.org/10.55640/eijps-06-05-20

Keywords:

Named entity recognition, Russian, Uzbek

Abstract

This study compares named entity recognition systems for Russian and Uzbek. The Russian line of work rests on 6 established datasets and on the transformer models Slovnet BERT NER and DeepPavlov RuBERT-CRF, whose F1 reaches roughly 0.92, whereas Uzbek resources only appeared from 2023 onward and remain an order of magnitude smaller. We examine the UZNER and BERTbek corpora and the Mengliev datasets, F1 figures on the WikiANN and XTREME benchmarks and typological obstacles such as agglutination and dual script. Data quality outweighs sheer size for Uzbek and a single-number comparison of the 2 languages is misleading because their annotation schemes differ.

Downloads

Download data is not yet available.

References

Arkhipov M. Tuning Multilingual Transformers for Language-Specific Named Entity Recognition / M. Arkhipov, M. Trofimova, Y. Kuratov, A. Sorokin // Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing. – Florence: ACL, 2019. – P. 89-93.

Conneau A. Unsupervised Cross-lingual Representation Learning at Scale / A. Conneau, K. Khandelwal, N. Goyal [et al.] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. – 2020. – P. 8440-8451.

Hu J. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation / J. Hu, S. Ruder, A. Siddhant [et al.] // Proceedings of the 37th International Conference on Machine Learning. – 2020. – P. 4411-4421.

Kuratov Y. Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language / Y. Kuratov, M. Arkhipov // arXiv preprint. – 2019.

Kuriyozov E. BERTbek: A Pretrained Language Model for Uzbek / E. Kuriyozov, D. Vilares, C. Gómez-Rodríguez // Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (SIGUL) @ LREC-COLING. – Torino, 2024. – P. 33-44.

Loukachevitch N. NEREL: A Russian Dataset with Nested Named Entities, Relations and Events / N. Loukachevitch, E. Artemova, T. Batura [et al.] // Proceedings of RANLP. – 2021. – P. 876-885.

Downloads

Published

2026-05-31

How to Cite

Djuraeva Zulkhumor Radjabovna. (2026). Comparative Analysis of Named Entity Recognition Models for Russian And Uzbek. European International Journal of Philological Sciences, 6(05), 95–100. https://doi.org/10.55640/eijps-06-05-20