A Hybrid Framework Combining CNN, LSTM, and Transfer Learning for Emotion Recognition

Ketan Sarvakar; Kaushik Rana

doi:10.62762/TMI.2025.572412

CiteScore

Impact Factor

Volume 1, Issue 2, ICCK Transactions on Machine Intelligence

Volume 1, Issue 2, 2025

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval Enhancing Fake News Detection with a Hybrid NLP-Machine Learning Framework A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Plant Disease Detection Using Deep Learning Techniques Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Analyzing the Translation and Impact of Popular Science Literature in China: A Case Study Approach

ICCK Transactions on Machine Intelligence, Volume 1, Issue 2, 2025: 103-116

Free to Read | Research Article | 26 September 2025

A Hybrid Framework Combining CNN, LSTM, and Transfer Learning for Emotion Recognition

Ketan Sarvakar 1 *

Kaushik Rana 1

1 Gujarat Technological University, Ahmedabad, India

* Corresponding Author: Ketan Sarvakar, [email protected]

DOI: 10.62762/TMI.2025.572412

Received: 24 August 2025, Accepted: 17 September 2025, Published: 26 September 2025

PDF (1.30 MB)

Article Metrics Cite This Article

Abstract

Deep learning has substantially enhanced facial emotion recognition, an essential element of human--computer interaction. This study evaluates the performance of multiple architectures, including a custom CNN, VGG-16, ResNet-50, and a hybrid CNN-LSTM framework, across FER2013 and CK+ datasets. Preprocessing steps involved grayscale conversion, image resizing, and pixel normalization. Experimental results show that ResNet-50 achieved the highest accuracy on FER2013 (76.85%), while the hybrid CNN-LSTM model attained superior performance on CK+ (92.30%). Performance metrics such as precision, recall, and F1-score were used for evaluation. Findings highlight the trade-off between computational efficiency and recognition accuracy, offering insights for developing robust, real-time emotion recognition systems.

Graphical Abstract

A Hybrid Framework Combining CNN, LSTM, and Transfer Learning for Emotion Recognition

Keywords

face expressions

face emotion recognition

deep learning

VGG-19

ResNet-50

Inception-V3

MobileNet

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Naga, P., Marri, S. D., & Borreo, R. (2023). Facial emotion recognition methods, datasets and technologies: A literature survey. Materials Today: Proceedings, 80, 2824-2828.
[CrossRef] [Google Scholar]
Chowdary, M. K., Nguyen, T. N., & Hemanth, D. J. (2023). Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Computing and Applications, 35(32), 23311-23328.
[CrossRef] [Google Scholar]
Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied intelligence, 53(4), 3974-4026.
[CrossRef] [Google Scholar]
Sarvakar, K., Senkamalavalli, R., Raghavendra, S., Kumar, J. S., Manjunath, R., & Jaiswal, S. (2023). Facial emotion recognition using convolutional neural networks. Materials Today: Proceedings, 80, 3560-3564.
[CrossRef] [Google Scholar]
Dias, W., Andalo, F., Padilha, R., Bertocco, G., Almeida, W., Costa, P., & Rocha, A. (2022). Cross-dataset emotion recognition from facial expressions through convolutional neural networks. Journal of Visual Communication and Image Representation, 82, 103395.
[CrossRef] [Google Scholar]
Hernandez-Luquin, F., & Escalante, H. J. (2023). Multi-branch deep radial basis function networks for facial emotion recognition. Neural Computing and Applications, 35(25), 18131-18145.
[CrossRef] [Google Scholar]
Kopalidis, T., Solachidis, V., Vretos, N., & Daras, P. (2024). Advances in facial expression recognition: a survey of methods, benchmarks, models, and datasets. Information, 15(3), 135.
[CrossRef] [Google Scholar]
Dominguez-Catena, I., Paternain, D., & Galar, M. (2024). Metrics for dataset demographic bias: A case study on facial expression recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5209-5226.
[CrossRef] [Google Scholar]
Shahzad, H. M., Bhatti, S. M., Jaffar, A., Akram, S., Alhajlah, M., & Mahmood, A. (2023). Hybrid facial emotion recognition using CNN-based features. Applied Sciences, 13(9), 5572.
[CrossRef] [Google Scholar]
Daněček, R., Black, M. J., & Bolkart, T. (2022). Emoca: Emotion driven monocular face capture and animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 20311-20322).
[CrossRef] [Google Scholar]
Borgalli, R. A., & Surve, S. (2022, March). Deep learning framework for facial emotion recognition using CNN architectures. In 2022 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 1777-1784). IEEE.
[CrossRef] [Google Scholar]
Arya, M., Goyal, U., & Chawla, S. (2024, June). A study on deep fake face detection techniques. In 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 459-466). IEEE.
[CrossRef] [Google Scholar]
Joshi, M. L., & Kanoongo, N. (2022). Depression detection using emotional artificial intelligence and machine learning: A closer review. Materials Today: Proceedings, 58, 217-226.
[CrossRef] [Google Scholar]
Dar, T., Javed, A., Bourouis, S., Hussein, H. S., & Alshazly, H. (2022). Efficient-SwishNet based system for facial emotion recognition. IEEE Access, 10, 71311-71328.
[CrossRef] [Google Scholar]
Slimani, K., Ruichek, Y., & Messoussi, R. (2022). Compound facial emotional expression recognition using cnn deep features. Engineering Letters, 30(4), 1402-1416.
[Google Scholar]
Pise, A., Vadapalli, H., & Sanders, I. (2022). Facial emotion recognition using temporal relational network: an application to E-learning. Multimedia Tools and Applications, 81(19), 26633-26653.
[CrossRef] [Google Scholar]
Mamieva, D., Abdusalomov, A. B., Mukhiddinov, M., & Whangbo, T. K. (2023). Improved face detection method via learning small faces on hard images based on a deep learning approach. Sensors, 23(1), 502.
[CrossRef] [Google Scholar]
Khan, A. R. (2022). Facial emotion recognition using conventional machine learning and deep learning methods: current achievements, analysis and remaining challenges. Information, 13(6), 268.
[CrossRef] [Google Scholar]
Wang, S., Qu, J., Zhang, Y., & Zhang, Y. (2023). Multimodal emotion recognition from EEG signals and facial expressions. IEEE Access, 11, 33061-33068.
[CrossRef] [Google Scholar]
Dwijayanti, S., Iqbal, M., & Suprapto, B. Y. (2022). Real-time implementation of face recognition and emotion recognition in a humanoid robot using a convolutional neural network. IEEE Access, 10, 89876-89886.
[CrossRef] [Google Scholar]
Gupta, S., Kumar, P., & Tekchandani, R. K. (2023). Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models. Multimedia Tools and Applications, 82(8), 11365-11394.
[CrossRef] [Google Scholar]
Rawal, N., & Stock-Homburg, R. M. (2022). Facial emotion expressions in human–robot interaction: A survey. International Journal of Social Robotics, 14(7), 1583-1604.
[CrossRef] [Google Scholar]
Wong, H. K., & Estudillo, A. J. (2022). Face masks affect emotion categorisation, age estimation, recognition, and gender classification from faces. Cognitive research: principles and implications, 7(1), 91.
[CrossRef] [Google Scholar]
Alsharekh, M. F. (2022). Facial emotion recognition in verbal communication based on deep learning. Sensors, 22(16), 6105.
[CrossRef] [Google Scholar]
Farkhod, A., Abdusalomov, A. B., Mukhiddinov, M., & Cho, Y. I. (2022). Development of real-time landmark-based emotion recognition CNN for masked faces. Sensors, 22(22), 8704.
[CrossRef] [Google Scholar]
Ding, Y., Robinson, N., Zhang, S., Zeng, Q., & Guan, C. (2022). TSception: Capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition. IEEE Transactions on Affective Computing, 14(3), 2238-2250.
[CrossRef] [Google Scholar]
Mukhiddinov, M., Djuraev, O., Akhmedov, F., Mukhamadiyev, A., & Cho, J. (2023). Masked face emotion recognition based on facial landmarks and deep learning approaches for visually impaired people. Sensors, 23(3), 1080.
[CrossRef] [Google Scholar]
Cai, Y., Li, X., & Li, J. (2023). Emotion recognition using different sensors, emotion models, methods and datasets: A comprehensive review. Sensors, 23(5), 2455.
[CrossRef] [Google Scholar]
Talala, S., Shvimmer, S., Simhon, R., Gilead, M., & Yitzhaky, Y. (2024). Emotion classification based on pulsatile images extracted from short facial videos via deep learning. Sensors, 24(8), 2620.
[CrossRef] [Google Scholar]
Pereira, R., Mendes, C., Ribeiro, J., Ribeiro, R., Miragaia, R., Rodrigues, N., ... & Pereira, A. (2024). Systematic review of emotion detection with computer vision and deep learning. Sensors, 24(11), 3484.
[CrossRef] [Google Scholar]
Pawar, P. M., Ronge, B. P., Gidde, R. R., Pawar, M. M., Misal, N. D., Budhewar, A. S., ... & Reddy, P. V. Techno-societal 2022.
[CrossRef] [Google Scholar]
Reddy, C. V. R., Reddy, U. S., & Kishore, K. V. K. (2019). Facial emotion recognition using NLPCA and SVM. Traitement du Signal, 36(1), 13-22.
[CrossRef] [Google Scholar]
Sajjad, M., Nasir, M., Ullah, F. U. M., Muhammad, K., Sangaiah, A. K., & Baik, S. W. (2019). Raspberry Pi assisted facial expression recognition framework for smart security in law-enforcement services. Information Sciences, 479, 416-431.
[CrossRef] [Google Scholar]
Nazir, M., Jan, Z., & Sajjad, M. (2017). Facial expression recognition using weber discrete wavelet transform. Journal of Intelligent & Fuzzy Systems, 33(1), 479-489.
[CrossRef] [Google Scholar]
Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., & Dobaie, A. M. (2018). Facial expression recognition via learning deep sparse autoencoders. Neurocomputing, 273, 643-649.
[CrossRef] [Google Scholar]
Uddin, M. Z., Hassan, M. M., Almogren, A., Zuair, M., Fortino, G., & Torresen, J. (2017). A facial expression recognition system using robust face features from depth videos and deep learning. Computers & Electrical Engineering, 63, 114-125.
[CrossRef] [Google Scholar]
Al-agha, L. S. A., Saleh, P. H. H., & Ghani, P. R. F. (2017). Geometric-based feature extraction and classification for emotion expressions of 3D video film. Journal of Advances in Information Technology, 8(2).
[CrossRef] [Google Scholar]
Ghimire, D., Lee, J., Li, Z. N., & Jeong, S. (2017). Recognition of facial expressions based on salient geometric features and support vector machines. Multimedia Tools and Applications, 76(6), 7921-7946.
[CrossRef] [Google Scholar]
Wang, J., & Yang, H. (2008, May). Face detection based on template matching and 2DPCA algorithm. In 2008 congress on image and signal processing (Vol. 4, pp. 575-579). IEEE.
[CrossRef] [Google Scholar]
Wu, P. P., Liu, H., Zhang, X. W., & Gao, Y. (2017). Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics. Frontiers of Information Technology & Electronic Engineering, 18(7), 955-967.
[CrossRef] [Google Scholar]
Ekundayo, O. S., & Viriri, S. (2021). Facial expression recognition: A review of trends and techniques. IEEE Access, 9, 136944-136973.
[CrossRef] [Google Scholar]
Kim, D. J. (2016). Facial expression recognition using ASM-based post-processing technique. Pattern Recognition and Image Analysis, 26(3), 576-581.
[CrossRef] [Google Scholar]
Cornejo, J. Y. R., Pedrini, H., & Flórez-Revuelta, F. (2015, October). Facial expression recognition with occlusions based on geometric representation. In Iberoamerican Congress on Pattern Recognition (pp. 263-270). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Siddiqi, M. H., Ali, R., Khan, A. M., Kim, E. S., Kim, G. J., & Lee, S. (2015). Facial expression recognition using active contour-based face detection, facial movement-based feature extraction, and non-linear feature selection. Multimedia Systems, 21(6), 541-555.
[CrossRef] [Google Scholar]
Chang, K. Y., Chen, C. S., & Hung, Y. P. (2013, October). Intensity rank estimation of facial expressions based on a single image. In 2013 IEEE International Conference on Systems, Man, and Cybernetics (pp. 3157-3162). IEEE.
[CrossRef] [Google Scholar]
Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the national academy of sciences, 111(15), E1454-E1462.
[CrossRef] [Google Scholar]
Mavani, V., Raman, S., & Miyapuram, K. P. (2017, October). Facial Expression Recognition Using Visual Saliency and Deep Learning. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (pp. 2783-2788). IEEE.
[CrossRef] [Google Scholar]
Slimani, K., Lekdioui, K., Messoussi, R., & Touahni, R. (2019, March). Compound facial expression recognition based on highway CNN. In Proceedings of the new challenges in data sciences: acts of the second conference of the Moroccan Classification Society (pp. 1-7).
[CrossRef] [Google Scholar]
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2852-2861).
[CrossRef] [Google Scholar]
Yi, L., & Mak, M. W. (2020). Improving speech emotion recognition with adversarial data augmentation network. IEEE transactions on neural networks and learning systems, 33(1), 172-184.
[CrossRef] [Google Scholar]
Fan, Y., Lam, J. C., & Li, V. O. (2018, September). Multi-region ensemble convolutional neural network for facial expression recognition. In International Conference on Artificial Neural Networks (pp. 84-94). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Ghosh, S., Dhall, A., & Sebe, N. (2018, October). Automatic group affect analysis in images via visual attribute and feature networks. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 1967-1971). IEEE.
[CrossRef] [Google Scholar]
Shen, F., Liu, J., & Wu, P. (2018, October). Double complete d-lbp with extreme learning machine auto-encoder and cascade forest for facial expression analysis. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 1947-1951). IEEE.
[CrossRef] [Google Scholar]
Vielzeuf, V., Kervadec, C., Pateux, S., Lechervy, A., & Jurie, F. (2018, October). An occam's razor view on learning audiovisual emotion recognition with small training sets. In Proceedings of the 20th ACM International Conference on Multimodal Interaction (pp. 589-593).
[CrossRef] [Google Scholar]
Acharya, D., Huang, Z., Paudel, D. P., & Van Gool, L. (2018, June). Covariance Pooling for Facial Expression Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 480-4807). IEEE.
[CrossRef] [Google Scholar]
Li, Y., Zeng, J., Shan, S., & Chen, X. (2018, August). Patch-gated CNN for occlusion-aware facial expression recognition. In 2018 24th international conference on pattern recognition (ICPR) (pp. 2209-2214). IEEE.
[CrossRef] [Google Scholar]
Deng, J., Pang, G., Zhang, Z., Pang, Z., Yang, H., & Yang, G. (2019). cGAN based facial expression recognition for human-robot interaction. IEEE Access, 7, 9848-9859.
[CrossRef] [Google Scholar]
Wang, K., Peng, X., Yang, J., Meng, D., & Qiao, Y. (2020). Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing, 29, 4057-4069.
[CrossRef] [Google Scholar]
Vo, T. H., Lee, G. S., Yang, H. J., & Kim, S. H. (2020). Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access, 8, 131988-132001.
[CrossRef] [Google Scholar]
Yu, J., Cai, Z., Li, R., Zhao, G., Xie, G., Zhu, J., ... & Zheng, W. (2023, June). Exploring Large-scale Unlabeled Faces to Enhance Facial Expression Recognition. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 5803-5810). IEEE.
[CrossRef] [Google Scholar]
Gu, X., Liu, C., & Wang, S. (2013). Biometric Recognition. Lecture Notes in Computer Science, 8232, 34-42.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Sarvakar, K., & Rana, K. (2025). A Hybrid Framework Combining CNN, LSTM, and Transfer Learning for Emotion Recognition. ICCK Transactions on Machine Intelligence, 1(2), 103–116. https://doi.org/10.62762/TMI.2025.572412

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 22

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Machine Intelligence

ISSN: 3068-7403 (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies