Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging

Aftab Ahmad Khan; Wahab Khan; Muhammad Alamzeb Khan; Khairullah Khan; Fida Muhammad Khan; Atta Ur Rahman; Hazrat Bilal; Islam Md Monirul

doi:10.62762/TACS.2024.493945

CiteScore

Impact Factor

Volume 1, Issue 2, ICCK Transactions on Advanced Computing and Systems

Volume 1, Issue 2, 2024

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Short and Long-Term Renewable Electricity Demand Forecasting Based on CNN-Bi-GRU Model Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems

ICCK Transactions on Advanced Computing and Systems, Volume 1, Issue 2, 2024: 106-116

Open Access | Research Article | 21 June 2024

Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging

Aftab Ahmad Khan 1

Wahab Khan 1

Muhammad Alamzeb Khan 1

Khairullah Khan 1

Fida Muhammad Khan 2 *

Atta Ur Rahman 3

Hazrat Bilal 4 *

Islam Md Monirul 4

1 Department of Computer Science, University of Science and Technology Bannu, Bannu 28100, Pakistan

2 Department of Computer Science, Qurtuba University of Science and Information Technology, Peshawar 25000, Pakistan

3 Interdisciplinary Research Centers for Finance and Digital Economy, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

4 College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China

* Corresponding Authors: Fida Muhammad Khan, [email protected] ; Hazrat Bilal, [email protected]

DOI: 10.62762/TACS.2024.493945

Received: 13 March 2024, Accepted: 08 May 2024, Published: 21 June 2024

PDF (1.00 MB)

Article Metrics Cite This Article

Abstract

The process of assigning grammatical categories, such as ``Noun'' and ``Verb,'' to every word in a text corpus is known as part-of-speech (POS) tagging. This technique is widely used in applications like sentiment analysis, machine translation, and other linguistic and computational tasks. However, the unique features of the Pashto language and its limited resources present significant challenges for POS tagging. This study explores the critical role of POS tagging in the Pashto language by employing six popular deep-learning and machine-learning techniques. Experimental results demonstrate machine learning methods' effectiveness in capturing Pashto text's grammatical patterns. The evaluation is based on a well-curated and annotated dataset of Pashto text, meticulously compiled from diverse sources and enriched with POS tags, providing a reliable foundation for performance analysis. Among the tested algorithms, K-Nearest Neighbor (KNN) and Decision Tree achieved the highest accuracy rates, with 94.19% and 94.34%, respectively. Random Forest and Support Vector Machine (SVM) also delivered competitive results, exceeding the 90% accuracy threshold. Multi-Layer Perceptron (MLP), evaluated with various activation functions like ReLU and Tanh, achieved an accuracy of 87.25%, while Naïve Bayes, tested with different variants such as Multinomial NB and Gaussian NB, attained 83.33%. These results highlight the potential of machine learning techniques in overcoming the challenges associated with Pashto POS tagging.

Graphical Abstract

Keywords

machine learning

part of speech tagging

morphological structure

grammatical features

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Kibble, R. (2013). Introduction to natural language processing. London: University of London.
[Google Scholar]
Ballan, L. (2003). Natural language processing.
[Google Scholar]
Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE transactions on neural networks and learning systems, 32(10), 4291-4308.
[CrossRef] [Google Scholar]
Joshi, A. K. (1991). Natural language processing. Science, 253(5025), 1242-1249.
[CrossRef] [Google Scholar]
Zaman, F., Maqbool, O., & Kanwal, J. (2024). Leveraging bidirectional lstm with crfs for pashto tagging. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(4), 1-17.
[CrossRef] [Google Scholar]
Fanni, S. C., Febi, M., Aghakhanyan, G., & Neri, E. (2023). Natural language processing. In Introduction to artificial intelligence (pp. 87-99). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Chopra, A., Prashar, A., & Sain, C. (2013). Natural language processing. International journal of technology enhancements and emerging engineering research, 1(4), 131-134.
[Google Scholar]
Mihalcea, R., Liu, H., & Lieberman, H. (2006, February). NLP (natural language processing) for NLP (natural language programming). In International Conference on intelligent text processing and computational linguistics (pp. 319-330). Berlin, Heidelberg: Springer Berlin Heidelberg.
[CrossRef] [Google Scholar]
Haq, I., Qiu, W., Guo, J., & Peng, T. (2023). The Pashto corpus and machine learning model for automatic POS tagging.
[CrossRef] [Google Scholar]
Haq, I., Qiu, W., Guo, J., & Tang, P. (2023). NLPashto: NLP toolkit for low-resource Pashto language. International Journal of Advanced Computer Science and Applications, 14(6).
[CrossRef] [Google Scholar]
Khan, H. A., Ali, M. J., & Hanni, U. E. (2020, November). Poster: A novel approach for pos tagging of pashto language. In 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH) (pp. 259-260). IEEE.
[CrossRef] [Google Scholar]
Schmid, H. (1994). Part-of-speech tagging with neural networks. arXiv preprint cmp-lg/9410018.
[CrossRef] [Google Scholar]
Rajper, R. A., Rajper, S., Maitlo, A., & Nabi, G. (2021). Analysis and comparative study of POS tagging techniques for national (Urdu) language and other regional languages of pakistan. SINDH UNIVERSITY RESEARCH JOURNAL (SCIENCE SERIES), 53(04).
[Google Scholar]
Naz, F., Anwar, W., Bajwa, U. I., & Munir, E. U. (2012). Urdu part of speech tagging using transformation based error driven learning. World Applied Sciences Journal, 16(3), 437-448.
[Google Scholar]
Khanam, M. H., & Murthy, K. M. (2014). Part-of-speech tagging of urdu in limited resources scenario. International Journal on Recent and Innovation Trends in Computing and Communication, 2(10), 3280-3285.
[Google Scholar]
Rabbi, I., Khan, A. M., & Ali, R. (2009). Rule-based part of speech tagging for Pashto language. In Conference on Language and Technology, Lahore, Pakistan.
[Google Scholar]
Rabbi, I., Khan, M. A., Ahmad, R., & Ali, R. (2016). Theoretical Analysis of Pashto Phrases for the Creation of Parser.
[Google Scholar]
Alharbi, R., Magdy, W., Darwish, K., AbdelAli, A., & Mubarak, H. (2018, May). Part-of-speech tagging for Arabic Gulf dialect using Bi-LSTM. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).
[Google Scholar]
Sajjad, H. (2007). Statistical part of speech tagger for Urdu. Unpublished MS Thesis, National University of Computer and Emerging Sciences, Lahore, Pakistan.
[Google Scholar]
Anwar, W., Wang, X., Li, L., & Wang, X. L. (2007, August). A statistical based part of speech tagger for Urdu language. In 2007 international conference on machine learning and cybernetics (Vol. 6, pp. 3418-3424). IEEE.
[CrossRef] [Google Scholar]
Habash, N., & Rambow, O. (2005, June). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 573-580).
[Google Scholar]
Okhovvat, M., & Bidgoli, B. M. (2011). A hidden Markov model for Persian part-of-speech tagging. Procedia Computer Science, 3, 977-981.
[CrossRef] [Google Scholar]
Seraji, M. (2011). A statistical part-of-speech tagger for Persian. In NODALIDA 2011, Riga, Latvia, May 11–13, 2011 (pp. 340-343).
[Google Scholar]

Cite This Article

APA Style

Khan, A. A., Khan, W., Khan, M. A., Khan, K., Khan, F. M., Rahman, A. U., Bilal, H., & Monirul, I. M. (2024). Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging. ICCK Transactions on Advanced Computing and Systems, 1(2), 106–116. https://doi.org/10.62762/TACS.2024.493945

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 18

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2024 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Transactions on Advanced Computing and Systems

ISSN: pending (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies