-
CiteScore
-
Impact Factor
Volume 1, Issue 2, ICCK Transactions on Advanced Computing and Systems
Volume 1, Issue 2, 2024
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Advanced Computing and Systems, Volume 1, Issue 2, 2024: 106-116

Open Access | Research Article | 21 June 2024
Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging
1 Department of Computer Science, University of Science and Technology Bannu, Bannu 28100, Pakistan
2 Department of Computer Science, Qurtuba University of Science and Information Technology, Peshawar 25000, Pakistan
3 Interdisciplinary Research Centers for Finance and Digital Economy, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia
4 College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
* Corresponding Authors: Fida Muhammad Khan, [email protected] ; Hazrat Bilal, [email protected]
Received: 13 March 2024, Accepted: 08 May 2024, Published: 21 June 2024  
Abstract
The process of assigning grammatical categories, such as ``Noun'' and ``Verb,'' to every word in a text corpus is known as part-of-speech (POS) tagging. This technique is widely used in applications like sentiment analysis, machine translation, and other linguistic and computational tasks. However, the unique features of the Pashto language and its limited resources present significant challenges for POS tagging. This study explores the critical role of POS tagging in the Pashto language by employing six popular deep-learning and machine-learning techniques. Experimental results demonstrate machine learning methods' effectiveness in capturing Pashto text's grammatical patterns. The evaluation is based on a well-curated and annotated dataset of Pashto text, meticulously compiled from diverse sources and enriched with POS tags, providing a reliable foundation for performance analysis. Among the tested algorithms, K-Nearest Neighbor (KNN) and Decision Tree achieved the highest accuracy rates, with 94.19% and 94.34%, respectively. Random Forest and Support Vector Machine (SVM) also delivered competitive results, exceeding the 90% accuracy threshold. Multi-Layer Perceptron (MLP), evaluated with various activation functions like ReLU and Tanh, achieved an accuracy of 87.25%, while Naïve Bayes, tested with different variants such as Multinomial NB and Gaussian NB, attained 83.33%. These results highlight the potential of machine learning techniques in overcoming the challenges associated with Pashto POS tagging.

Graphical Abstract
Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging

Keywords
machine learning
part of speech tagging
morphological structure
grammatical features

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Kibble, R. (2013). Introduction to natural language processing. London: University of London.
    [Google Scholar]
  2. Ballan, L. (2003). Natural language processing.
    [Google Scholar]
  3. Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE transactions on neural networks and learning systems, 32(10), 4291-4308.
    [CrossRef]   [Google Scholar]
  4. Joshi, A. K. (1991). Natural language processing. Science, 253(5025), 1242-1249.
    [CrossRef]   [Google Scholar]
  5. Zaman, F., Maqbool, O., & Kanwal, J. (2024). Leveraging bidirectional lstm with crfs for pashto tagging. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(4), 1-17.
    [CrossRef]   [Google Scholar]
  6. Fanni, S. C., Febi, M., Aghakhanyan, G., & Neri, E. (2023). Natural language processing. In Introduction to artificial intelligence (pp. 87-99). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  7. Chopra, A., Prashar, A., & Sain, C. (2013). Natural language processing. International journal of technology enhancements and emerging engineering research, 1(4), 131-134.
    [Google Scholar]
  8. Mihalcea, R., Liu, H., & Lieberman, H. (2006, February). NLP (natural language processing) for NLP (natural language programming). In International Conference on intelligent text processing and computational linguistics (pp. 319-330). Berlin, Heidelberg: Springer Berlin Heidelberg.
    [CrossRef]   [Google Scholar]
  9. Haq, I., Qiu, W., Guo, J., & Peng, T. (2023). The Pashto corpus and machine learning model for automatic POS tagging.
    [CrossRef]   [Google Scholar]
  10. Haq, I., Qiu, W., Guo, J., & Tang, P. (2023). NLPashto: NLP toolkit for low-resource Pashto language. International Journal of Advanced Computer Science and Applications, 14(6).
    [CrossRef]   [Google Scholar]
  11. Khan, H. A., Ali, M. J., & Hanni, U. E. (2020, November). Poster: A novel approach for pos tagging of pashto language. In 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH) (pp. 259-260). IEEE.
    [CrossRef]   [Google Scholar]
  12. Schmid, H. (1994). Part-of-speech tagging with neural networks. arXiv preprint cmp-lg/9410018.
    [CrossRef]   [Google Scholar]
  13. Rajper, R. A., Rajper, S., Maitlo, A., & Nabi, G. (2021). Analysis and comparative study of POS tagging techniques for national (Urdu) language and other regional languages of pakistan. SINDH UNIVERSITY RESEARCH JOURNAL (SCIENCE SERIES), 53(04).
    [Google Scholar]
  14. Naz, F., Anwar, W., Bajwa, U. I., & Munir, E. U. (2012). Urdu part of speech tagging using transformation based error driven learning. World Applied Sciences Journal, 16(3), 437-448.
    [Google Scholar]
  15. Khanam, M. H., & Murthy, K. M. (2014). Part-of-speech tagging of urdu in limited resources scenario. International Journal on Recent and Innovation Trends in Computing and Communication, 2(10), 3280-3285.
    [Google Scholar]
  16. Rabbi, I., Khan, A. M., & Ali, R. (2009). Rule-based part of speech tagging for Pashto language. In Conference on Language and Technology, Lahore, Pakistan.
    [Google Scholar]
  17. Rabbi, I., Khan, M. A., Ahmad, R., & Ali, R. (2016). Theoretical Analysis of Pashto Phrases for the Creation of Parser.
    [Google Scholar]
  18. Alharbi, R., Magdy, W., Darwish, K., AbdelAli, A., & Mubarak, H. (2018, May). Part-of-speech tagging for Arabic Gulf dialect using Bi-LSTM. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).
    [Google Scholar]
  19. Sajjad, H. (2007). Statistical part of speech tagger for Urdu. Unpublished MS Thesis, National University of Computer and Emerging Sciences, Lahore, Pakistan.
    [Google Scholar]
  20. Anwar, W., Wang, X., Li, L., & Wang, X. L. (2007, August). A statistical based part of speech tagger for Urdu language. In 2007 international conference on machine learning and cybernetics (Vol. 6, pp. 3418-3424). IEEE.
    [CrossRef]   [Google Scholar]
  21. Habash, N., & Rambow, O. (2005, June). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 573-580).
    [Google Scholar]
  22. Okhovvat, M., & Bidgoli, B. M. (2011). A hidden Markov model for Persian part-of-speech tagging. Procedia Computer Science, 3, 977-981.
    [CrossRef]   [Google Scholar]
  23. Seraji, M. (2011). A statistical part-of-speech tagger for Persian. In NODALIDA 2011, Riga, Latvia, May 11–13, 2011 (pp. 340-343).
    [Google Scholar]

Cite This Article
APA Style
Khan, A. A., Khan, W., Khan, M. A., Khan, K., Khan, F. M., Rahman, A. U., Bilal, H., & Monirul, I. M. (2024). Comparison of Machine Learning and Deep Learning Models for Part-of-Speech Tagging. ICCK Transactions on Advanced Computing and Systems, 1(2), 106–116. https://doi.org/10.62762/TACS.2024.493945

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 64
PDF Downloads: 18

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
CC BY Copyright © 2024 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
ICCK Transactions on Advanced Computing and Systems

ICCK Transactions on Advanced Computing and Systems

ISSN: pending (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/