-
CiteScore
-
Impact Factor
Volume 2, Issue 2, ICCK Transactions on Advanced Computing and Systems
Volume 2, Issue 2, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Advanced Computing and Systems, Volume 2, Issue 2, 2025: 1-16

Open Access | Research Article | 17 May 2025
Enhancing Sentiment Analysis of Roman Urdu Using Augmentation Techniques and Deep Learning Models
1 Department of Computer Science, University of Science and Technology, Bannu, Khyber Pakhtunkhwa, Pakistan
2 Department of Computer Science and Engineering, Sejong University, Seoul 05006, Republic of Korea
3 Department of Computer Science, Islamia College University, Peshawar, Khyber Pakhtunkhwa, Pakistan
* Corresponding Author: Wahab Khan, [email protected]
Received: 29 February 2025, Accepted: 09 May 2025, Published: 17 May 2025  
Abstract
Roman Urdu sentiment analysis faces significant challenges due to transliteration inconsistencies, informal language usage, and the lack of labeled datasets. This study proposes a novel framework that addresses these challenges by combining advanced data preprocessing techniques and data augmentation strategies such as synonym replacement, back-translation, and random word insertion. These methods enhance dataset diversity, improving the model’s generalization ability. A rich Roman Urdu dataset was collected from diverse sources, including social media platforms (Facebook, Twitter, YouTube), blogs, forums, and e-commerce sites, to capture a wide range of user opinions. Three deep learning models, Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM), were evaluated for sentiment classification. The results show that the LSTM model outperforms the others with an accuracy of 94%, compared to 90% for RNN and 92% for GRU. The LSTM’s ability to capture long-term dependencies and contextual nuances in Roman Urdu text makes it the most effective model for this task, demonstrating a significant improvement over the traditional method.

Graphical Abstract
Enhancing Sentiment Analysis of Roman Urdu Using Augmentation Techniques and Deep Learning Models

Keywords
oman Urdu
sentiment analysis
deep learning
data augmentation
text classification
GRU
LSTM
RNN

Data Availability Statement
The dataset used in this study is publicly available at: https://github.com/awais1992/RomanUrdu-Sentiment-Aug. It contains Roman Urdu sentiment-annotated data, which can be accessed and utilized under the terms specified in the repository.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Huang, H., Zavareh, A. A., & Mustafa, M. B. (2023). Sentiment analysis in e-commerce platforms: A review of current techniques and future directions. IEEE Access, 11, 90367-90382.
    [CrossRef]   [Google Scholar]
  2. Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731–5780.
    [CrossRef]   [Google Scholar]
  3. Al-Jarf, R. (2023). Non-conventional spelling in informal, colloquial Arabic writing on Facebook. International Journal of Linguistics, Literature and Translation, 6(4), 35–47.
    [CrossRef]   [Google Scholar]
  4. Iqbal, Z., Khan, F. M., Khan, I. U., & Khan, I. U. (2024). Fake news identification in Urdu tweets using machine learning models. Asian Bulletin of Big Data Management, 4(1).
    [Google Scholar]
  5. Chandio, B. A., Imran, A. S., Bakhtyar, M., Daudpota, S. M., & Baber, J. (2022). Attention-based RU-BiLSTM sentiment analysis model for roman Urdu. Applied Sciences, 12(7), 3641.
    [CrossRef]   [Google Scholar]
  6. Kirov, C., Johny, C., Katanova, A., Gutkin, A., & Roark, B. (2024). Context-aware transliteration of romanized South Asian languages. Computational Linguistics, 50(2), 475-534.
    [CrossRef]   [Google Scholar]
  7. Muhammad, K. B., & Burney, S. A. (2023). Innovations in urdu sentiment analysis using machine and deep learning techniques for two-class classification of symmetric datasets. Symmetry, 15(5), 1027.
    [CrossRef]   [Google Scholar]
  8. Khan, M., Khan, A., Khan, W., Su’ud, M. M., Alam, M. M., Subhan, F., & Asghar, M. Z. (2021). A review of Urdu sentiment analysis with multilingual perspective: A case of Urdu and roman Urdu language. Computers, 11(1), 3.
    [CrossRef]   [Google Scholar]
  9. Bilal, M., Khan, A., Jan, S., & Musa, S. (2022). Context-aware deep learning model for detection of roman Urdu hate speech on social media platform. IEEE Access, 10, 121133–121151.
    [CrossRef]   [Google Scholar]
  10. Hussain, R., Iqbal, M., & Saleem, A. (2022). The linguistic landscape of Peshawar: Social hierarchies of English and its transliterations. University of Chitral Journal of Linguistics and Literature, 6(I), 223-239.
    [CrossRef]   [Google Scholar]
  11. Din, S. U., Khusro, S., Khan, F. A., Ahmad, M., Ali, O., & Ghazal, T. M. (2025). An automatic approach for the identification of offensive language in Perso-Arabic Urdu Language: Dataset Creation and Evaluation. IEEE Access, 13, 19755-19769.
    [CrossRef]   [Google Scholar]
  12. Dewani, A., Memon, M. A., & Bhatti, S. (2021). Development of computational linguistic resources for automated detection of textual cyberbullying threats in Roman Urdu language. 3 c TIC: cuadernos de desarrollo aplicados a las TIC, 10(2), 101-121.
    [Google Scholar]
  13. Ahmad, U. J., & Malkani, Y. A. (2024, January). Roman Urdu Slang Dictionary Development for Facebook Comment Sentiment Analysis. In 2024 IEEE 1st Karachi Section Humanitarian Technology Conference (KHI-HTC) (pp. 1-4). IEEE.
    [CrossRef]   [Google Scholar]
  14. Ilyas, A., Shahzad, K., & Kamran Malik, M. (2023). Emotion detection in code-mixed roman urdu-english text. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-28.
    [CrossRef]   [Google Scholar]
  15. Dongare, P. (2024, May). Creating corpus of low resource Indian languages for natural language processing: Challenges and opportunities. In Proceedings of the 7th workshop on Indian language data: Resources and evaluation (pp. 54-58).
    [Google Scholar]
  16. Mohamed, Y., & Menzel, W. (2023, October). Transfer of Models and Resources for Under-Resourced Languages Semantic Role Labeling. In Pan African Conference on Artificial Intelligence (pp. 141-153). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  17. Li, D., Ahmed, K., Zheng, Z., Mohsan, S. A. H., Alsharif, M. H., Hadjouni, M., ... & Mostafa, S. M. (2022). Roman Urdu sentiment analysis using transfer learning. Applied Sciences, 12(20), 10344.
    [CrossRef]   [Google Scholar]
  18. Malik, M., Ghous, H., Ali, M. I., Ismail, M., Ali, Z. H., & Amin, H. M. (2023). Sentiment analysis of roman text: challenges, opportunities, and future directions. International Journal of Information Systems and Computer Technologies, 2(2), 1-16.
    [CrossRef]   [Google Scholar]
  19. Londhe, D. D., Kumari, A., & Emmanuel, M. (2021, April). Challenges in multilingual and mixed script sentiment analysis. In 2021 6Th international conference for convergence in technology (i2CT) (pp. 1-6). IEEE.
    [CrossRef]   [Google Scholar]
  20. Jawad, K., Ahmad, M., Alvi, M., & Alvi, M. B. (2024). RUSAS: Roman Urdu Sentiment Analysis System. Computers, Materials and Continua, 79(1), 1463-1480.
    [CrossRef]   [Google Scholar]
  21. Khan, L., Amjad, A., Afaq, K. M., & Chang, H. T. (2022). Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Applied Sciences, 12(5), 2694.
    [CrossRef]   [Google Scholar]
  22. Ali, A., Khan, M., Khan, K., Khan, R. U., & Aloraini, A. (2024). Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning. Computers, Materials and Continua, 79(1).
    [CrossRef]   [Google Scholar]
  23. Aslam, M. A., Khan, K., Khan, W., Khan, S. U., Albanyan, A., & Algamdi, S. A. (2025). Paraphrase detection for Urdu language text using fine-tune BiLSTM framework. Scientific Reports, 15(1), 15383.
    [CrossRef]   [Google Scholar]
  24. Khattak, A., Asghar, M. Z., Saeed, A., Hameed, I. A., Hassan, S. A., & Ahmad, S. (2021). A survey on sentiment analysis in Urdu: A resource-poor language. Egyptian Informatics Journal, 22(1), 53-74.
    [CrossRef]   [Google Scholar]
  25. Maqbool, F., Spahiu, B., & Maurino, A. (2024). Impact of data augmentation on hate speech detection in Roman Urdu.
    [Google Scholar]
  26. Safder, I., Abu Bakar, M., Zaman, F., Waheed, H., Aljohani, N. R., Nawaz, R., & Hassan, S. U. (2024). Transforming language translation: A deep learning approach to Urdu–English translation. Journal of Ambient Intelligence and Humanized Computing, 15(10), 3651-3662.
    [CrossRef]   [Google Scholar]
  27. Ehsan, S. (2024). Bi-directional Roman-Urdu transliteration system.
    [Google Scholar]
  28. Ali, S., Jamil, U., Younas, M., Zafar, B., & Hanif, M. K. (2024). Optimized Identification of Sentence-Level Multiclass Events on Urdu-Language-Text Using Machine Learning Techniques. IEEE Access, 13, 1-25.
    [CrossRef]   [Google Scholar]
  29. Sehar, U., Kanwal, S., Allheeib, N. I., Almari, S., Khan, F., Dashtipur, K., ... & Khashan, O. A. (2023). A hybrid dependency-based approach for Urdu sentiment analysis. Scientific Reports, 13(1), 22075.
    [CrossRef]   [Google Scholar]
  30. Khadim, K., Asghar, M. Z., Saeed, A., & Ahmad, S. (2024). Sentiment analysis of social media content in Roman Urdu language using data mining techniques. Research Consortium Archive, 2(4), 230–244.
    [CrossRef]   [Google Scholar]
  31. Ashraf, M. R., Hussain, M., Jaffar, M. A., Ramay, W. Y., & Faheem, M. (2024). Revolutionizing Urdu Sentiment Analysis: Harnessing the Power of XLM-R and GPT-2. IEEE Access, 12, 99779-99793.
    [CrossRef]   [Google Scholar]
  32. Ullah, K., Aslam, M., Khan, M. U. G., Alamri, F. S., & Khan, A. R. (2025). UEF-HOCUrdu: unified embeddings ensemble framework for hate and offensive text classification in Urdu. IEEE Access, 13, 21853-21869.
    [CrossRef]   [Google Scholar]
  33. Luo, Q., Zeng, W., Chen, M., Peng, G., Yuan, X., & Yin, Q. (2023, July). Self-attention and transformers: Driving the evolution of large language models. In 2023 IEEE 6th International conference on electronic information and communication technology (ICEICT) (pp. 401-405). IEEE.
    [CrossRef]   [Google Scholar]
  34. Ashraf, M. R., Jana, Y., Umer, Q., Jaffar, M. A., Chung, S., & Ramay, W. Y. (2023). BERT-based sentiment analysis for low-resourced languages: A case study of Urdu language. IEEE Access, 11, 110245-110259.
    [CrossRef]   [Google Scholar]
  35. Bello, A., Ng, S. C., & Leung, M. F. (2023). A BERT framework to sentiment analysis of tweets. Sensors, 23(1), 506.
    [CrossRef]   [Google Scholar]
  36. Jahin, M. A. J., Shovon, M. S. H., Mridha, M. F., Islam, M. R., & Watanobe, Y. (2024). A hybrid transformer and attention-based recurrent neural network for robust and interpretable sentiment analysis of tweets. Scientific Reports, 14(1), 24882.
    [CrossRef]   [Google Scholar]
  37. Azam, U., Rizwan, H., & Karim, A. (2022). Exploring data augmentation strategies for hate speech detection in Roman Urdu. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4523–4531).
    [Google Scholar]
  38. Nazir, S., Faisal, C. N., Habib, M. A., & Ahmad, H. (2025). Leveraging multilingual transformer for multiclass sentiment analysis in code-mixed data of low-resource languages. IEEE Access.
    [CrossRef]   [Google Scholar]
  39. Li, L. B., Hou, Y., & Che, W. (2022). Data augmentation approaches in natural language processing: A survey. AI Open, 3, 71–90.
    [CrossRef]   [Google Scholar]
  40. Khenglawt, V., Laskar, S. R., Pakray, P., & Khan, A. K. (2024). Addressing data scarcity issue for English–Mizo neural machine translation using data augmentation and language model. Journal of Intelligent & Fuzzy Systems, 46(3), 6313-6323.
    [CrossRef]   [Google Scholar]
  41. Xylogiannopoulos, K. F., Xanthopoulos, P., Karampelas, P., & Bakamitsos, Y. Is Ai-Assisted Paraphrase the New Tool for Fake Review Creation? Challenges and Remedies. Challenges and Remedies. https://dx.doi.org/10.2139/ssrn.4853659
    [Google Scholar]
  42. Pahari, N. (2024). Sentiment analysis on code switched and low resource settings.
    [Google Scholar]
  43. Chandio, B. A., Shaikh, A., Bakhtyar, M., Alrizq, M., Baber, J., Sulaiman, A., & Noor, W. (2022). Sentiment analysis of Roman Urdu on e-commerce reviews using machine learning. CMES-Computer Modeling in Engineering & Sciences, 131(3), 1263–1287.
    [Google Scholar]
  44. Xu, Q. A., Chang, V., & Jayne, C. (2022). A systematic review of social media-based sentiment analysis: Emerging trends and challenges. Decision Analytics Journal, 3, 100073.
    [CrossRef]   [Google Scholar]
  45. Malik, M., & Ghous, H. (2023). Sentiment Analysis of Roman Urdu Text Using Machine Learning Techniques. Innovative Computing Review, 3(2), 56-74.
    [CrossRef]   [Google Scholar]
  46. Ahmad, G. I., & Singla, J. (2022). (LISACMT) Language identification and sentiment analysis of English-Urdu ‘code-mixed’ text using LSTM. In 2022 International Conference on Inventive Computation Technologies (ICICT) (pp. 430–435). IEEE.
    [CrossRef]   [Google Scholar]
  47. Doddapaneni, S., Ramesh, G., Khapra, M., Kunchukuttan, A., & Kumar, P. (2025). A primer on pretrained multilingual language models. ACM Computing Surveys, 57(9), 1-39.
    [CrossRef]   [Google Scholar]
  48. Kaur, M., & Saini, M. (2024). Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(9), 1-23.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Khan, M. O., Khan, W., Wang, Y., Rehman, A. U, & Khan, M. A. (2025). Enhancing Sentiment Analysis of Roman Urdu Using Augmentation Techniques and Deep Learning Models. ICCK Transactions on Advanced Computing and Systems, 2(2), 1–16. https://doi.org/10.62762/TACS.2025.190575

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 25
PDF Downloads: 8

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
ICCK Transactions on Advanced Computing and Systems

ICCK Transactions on Advanced Computing and Systems

ISSN: pending (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/