Tackling Arabic NLP Challenges: POS-Tagging with Transformer-Based Models and Nuanced Evaluation

doi:10.23940/ijpe.25.08.p1.411421

Abstract

Abstract: Modern Standard Arabic's (MSA) rich and agglutinative morphology, complexity of clitics, lack of diacritical marks, and resulting morphosyntactic ambiguities make POS-tagging difficult. In order to test for ambiguous forms and stylistic variants, this study assesses the robustness of AraBERT, MARBERT, and CAMeL Tools on an enriched corpus of 8,147 sentences (293,199 tokens), which includes 400 synthetic sentences and 200 literary sentences. The findings demonstrate the limitations of the models when exposed to a range of text types with strong performances on journalistic texts (F1 Macro ~83.5%) and a decline on synthetic (~66%) and literary (~55%) data. To enhance the investigation of ambiguity, we also suggest the PL-Score, a supplementary measure that assesses errors based on their linguistic plausibility (e.g., NOUN→ADJ). These findings underscore the necessity for varied corpora and robust hybrid methodologies combining deep learning and human linguistic knowledge to significantly enhance POS-tagging with implications for machine translation and literary text analysis.

Key words: POS tagging, MSA, morphosyntactic ambiguity, transformers, diacritization, augmented corpus, PL Score

Roussafi Mahdjoubi, Mohamed Tayeb Laskri. Tackling Arabic NLP Challenges: POS-Tagging with Transformer-Based Models and Nuanced Evaluation [J]. Int J Performability Eng, 2025, 21(8): 411-421.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Eberhard D.M., Simons G.F., andFenning C.D., 2015. Ethnologue: languages of the world.
[2] Habash N., andSadat F., 2006. Arabic preprocessing schemes for statistical machine translation. InProceedings of the Human Language Technology Conference of the Naacl, Companion Volume: Short Papers, pp. 49-52.
[3] Al-Ghamdi S., Al-Khalifa H., andAl-Salman A., 2023. Fine-tuning Bert-based pre-trained models for Arabic dependency parsing.Applied Sciences, 13(7), 4225.
[4] Altantawy M., Habash N., Rambow O., andSaleh I., 2010. Morphological analysis and generation of Arabic nouns: A morphemic functional approach. InLREC.
[5] Pasha A., Al-Badrashiny M., Diab M.T., El Kholy A., Eskander R., Habash N., Pooleery M., Rambow O., andRoth R., 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Lrec,14(2014), pp. 1094-1101.
[6] Obeid O., Zalmout N., Khalifa S., Taji D., Oudah M., Alhafni B., Inoue G., Eryani F., Erdmann A., andHabash N., 2020. CAMeL tools: an open source Python toolkit for arabic natural language processing. InProceedings of the Twelfth Language Resources and Evaluation Conference, pp. 7022-7032.
[7] Antoun W., Baly F., andHajj H., 2020. Arabert: transformer-based model for arabic language understanding.Arxiv Preprint Arxiv:2003.00104.
[8] Abdul-Mageed M., Elmadany A., andNagoudi E.M.B., 2020. ARBERT & MARBERT: deep bidirectional transformers for arabic.Arxiv Preprint Arxiv:2101.01785.
[9] Manning C.D.,2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics?. InInternational Conference on Intelligent Text Processing and Computational Linguistics, pp. 171-189.
[10] Manning C.D., Surdeanu M., Bauer J., Finkel J.R., Bethard S., andMcClosky D., 2014. The Stanford CoreNLP natural language processing toolkit. InProceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
[11] Maamouri M., Bies A., Buckwalter T., andMekki W., 2004. The penn arabic treebank: building a large-scale annotated arabic corpus. InNEMLAR Conference on Arabic Language Resources and Tools, 27, pp. 466-467.
[12] Albogamy F., andRamsay A., 2016. Fast and robust POS tagger for arabic tweets using agreement-based bootstrapping. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pp. 1500-1506.
[13] Abdelali A., Darwish K., Durrani N., andMubarak H., 2016. Farasa: A fast and furious segmenter for arabic. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11-16.
[14] Freihat A.A., Bella G., Mubarak H., andGiunchiglia F., 2018. A single-model approach for arabic segmentation, POS tagging, and named entity recognition. In2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), pp. 1-8.
[15] Aliwy A.H.,2015. Combining POS taggers in master-slaves technique for highly inflected languages as arabic. In2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1-5.
[16] Wang P., Qian Y., Soong F.K., He L., andZhao H., 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network.Arxiv Preprint Arxiv:1510.06168.
[17] Plank B., Søgaard A., andGoldberg Y., 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss.Arxiv Preprint Arxiv:1604.05529.
[18] Saidi R., Jarray F., andMansour M., 2021. A Bert based approach for arabic POS tagging. InInternational Work-Conference on Artificial Neural Networks, pp. 311-321.
[19] Othmane C.Z.B., Fraj F.B., andLimam I., 2017. POS-tagging arabic texts: A novel approach based on ant colony. Natural Language Engineering,23(3), pp. 419-439.
[20] Hadni M., Ouatik S.A., Lachkar A., andMeknassi M., 2013. Hybrid part-of-speech tagger for non-vocalized arabic text. International Journal on Natural Language Computing,2(6), pp. 1-15.
[21] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., andPolosukhin I., 2017. Attention is all you need.Advances in Neural Information Processing Systems, 30.
[22] Inoue G., Alhafni B., Baimukan N., Bouamor H., andHabash N., 2021. The interplay of variant, size, and task type in arabic pre-trained language models.Arxiv Preprint Arxiv:2103.06678.
[23] Safaya A., Abdullatif M., andYuret D., 2020. Kuisail at semeval-2020 task 12: Bert-CNN for offensive speech identification in social media.Arxiv Preprint Arxiv:2007.13184.
[24] Lan W., Chen Y., Xu W., andRitter A., 2020. An empirical study of pre-trained transformers for arabic information extraction.Arxiv Preprint Arxiv:2004.14519.
[25] Conneau A., Lample G., Rinott R., Williams A., Bowman S.R., Schwenk H., andStoyanov V., 2018. XNLI: evaluating cross-lingual sentence representations.Arxiv Preprint Arxiv:1809.05053.
[26] Devlin J., Chang M.W., Lee K., andToutanova K., 2019. Bert: pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186.
[27] Inoue G., Khalifa S., andHabash N., 2021. Morphosyntactic tagging with pre-trained language models for arabic and its dialects.Arxiv Preprint Arxiv:2110.06852.
[28] Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M.A., Lacroix T., Rozière B., Goyal N., Hambro E., Azhar F., andRodriguez A., 2023. Llama: open and efficient foundation language models.Arxiv Preprint Arxiv:2302.13971.
[29] Darwish K., Mubarak H., Eldesouki M., Abdelali A., Samih Y., Alharbi R., Attia M., Magdy W., andKallmeyer L., 2018. Multi-dialect arabic POS tagging: A CRF approach. In11th Edition of the Language Resources and Evaluation Conference, pp. 93-98.
[30] Zalmout N., andHabash N., 2017. Don’t throw those morphological analyzers away just yet: neural morphological disambiguation for arabic. InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 704-713.
[31] Seraji M., Ginter F., andNivre J., 2016. Universal dependencies for persian. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pp. 2361-2365.
[32] Eryiğit G., Nivre J., andOflazer K., 2008. Dependency parsing of turkish. Computational Linguistics,34(3), pp. 357-389.
[33] Fadel A., Tuffaha I., andAl-Ayyoub M., 2019. Arabic text diacritization using deep neural networks. In2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), pp. 1-7.
[34] Zerrouki T., andBalla A., 2017. Tashkeela: novel corpus of arabic vocalized texts, data for auto-diacritization systems.Data in Brief, 11, 147.
[35] Yacouby R., andAxman D., 2020. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. InProceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp. 79-91.
[36] Benajiba Y., Diab M., andRosso P., 2008. Arabic named entity recognition using optimized feature sets. InProceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 284-293.
[37] Freihat A.A., Bella G., Abbas M., Mubarak H., andGiunchiglia F., 2022. ALP: an arabic linguistic pipeline. InAnalysis and Application of Natural Language and Speech Processing, pp. 67-99.
[38] Meguehout H., Bouhadada T., andLaskri M.T., 2017. Semantic role labeling for arabic language using case-based reasoning approach. International Journal of Speech Technology,20(2), pp. 363-372.