Abstract
In the age of increasing machine-mediated communication, the ability to detect emotional nuances in speech has become a critical competency for intelligent systems. This paper presents a robust Speech Emotion Recognition (SER) framework that integrates a hybrid deep learning architecture with a real-time web-based inference interface. Utilizing the RAVDESS dataset, the proposed pipeline encompasses comprehensive preprocessing, data augmentation techniques, and feature extraction based on Mel-Frequency Cepstral Coefficients (MFCCs), Chroma features, and Mel-spectrograms. A comparative experiment was run against a standard machine learning classifier such as K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest, and XGBoost. The experimental results indicate that the CNN-BiLSTM-Conv1D model proposed is much better as compared to conventional models with a state-of-the-art classification accuracy of 94%. The model was further evaluated using ROC-AUC curves and per-class performance metrics. It was subsequently deployed using a Flask-based web interface that enables users to upload voice inputs and receive real-time emotion predictions. This end-to-end system addresses the shortcomings of earlier SER approaches---such as limited temporal modeling and reduced generalization---and showcases practical applicability in domains like mental health monitoring, virtual assistants, and affective computing.
Keywords
speech emotion recognition
deep learning
CNN-BiLSTM
RAVDESS
MFCC
real-time prediction
human-computer interaction
audio processing
web deployment
affective computing
Data Availability Statement
Data will be made available on request.
Funding
This work was supported without any funding.
Conflicts of Interest
The authors declare no conflicts of interest.
Ethical Approval and Consent to Participate
Not applicable.
Cite This Article
APA Style
Tiwari, S., Kumar, D., Mahajan, A., & Sachar, S. (2025). Emotion Detection from Speech Using CNN-BiLSTM with Feature Rich Audio Inputs. ICCK Transactions on Machine Intelligence, 1(2), 80–89. https://doi.org/10.62762/TMI.2025.306750
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.