AUDD: Audio Deepfake Detection Using Paralinguistic Feature Extraction Techniques

Zahoor Ahmed; Gul Sher Ali Khan; Raja Vavekanand

doi:10.62762/JCI.2024.667518

CiteScore

Impact Factor

Volume 1, Issue 1, Journal of Computing Intelligence

Volume 1, Issue 1, 2025

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Short and Long-Term Renewable Electricity Demand Forecasting Based on CNN-Bi-GRU Model Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems

Journal of Computing Intelligence, Volume 1, Issue 1, 2025: 3-8

Open Access | Research Article | 03 August 2025

AUDD: Audio Deepfake Detection Using Paralinguistic Feature Extraction Techniques

Zahoor Ahmed 1

Gul Sher Ali Khan 1

Raja Vavekanand 2 *

1 Balochistan University of Information Technology, Engineering and Management Sciences, Baleli, Quetta 87300, Pakistan

2 Benazir Bhutto Shaheed University Lyari, Karachi 75660, Sindh, Pakistan

* Corresponding Author: Raja Vavekanand, [email protected]

DOI: 10.62762/JCI.2024.667518

Received: 30 November 2024, Accepted: 27 June 2025, Published: 03 August 2025

PDF (708.16 KB)

Article Metrics Cite This Article

Abstract

This work investigates the effectiveness of incorporating paralinguistic feature extraction in audio deepfake detection models. The proposed model extracts paralinguistic features from audio clips and represents them as 1024-dimensional vector embeddings. These embeddings are then used as input for a logistic regression model, which performs binary classification to distinguish between real and deepfake audio samples. The ASVspoof2019 dataset, comprising both genuine and spoofed audio clips, is used to evaluate the model's performance. The results are assessed using evaluation metrics such as Equal Error Rate (EER) and accuracy, which provide insight into the model's effectiveness compared to state-of-the-art methods. The proposed model achieves an EER of 3.04% and an accuracy of 97.9%, indicating that paralinguistic feature extraction is a promising approach for audio deepfake detection. These results suggest that incorporating paralinguistic features can improve the performance of audio deepfake detection systems, making it a valuable tool for future research in this area. Overall, the study demonstrates the potential of paralinguistic feature extraction in enhancing the accuracy and reliability of audio deepfake detection methods.

Graphical Abstract

Keywords

audio

deepfake

paralinguistic

deep learning

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Yamagishi, J., Todisco, M., Sahidullah, M., Delgado, H., Wang, X., Evans, N., ... & Nautsch, A. (2019). Asvspoof 2019: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. ASV Spoof, 13.
[Google Scholar]
Shor, J., & Venugopalan, S. (2022). TRILLsson: Distilled Universal Paralinguistic Speech Representations. Interspeech 2022.
[CrossRef] [Google Scholar]
Kaur, R., Gabrijelcic, D., & Klobucar, T. (2023). Artificial intelligence for cybersecurity: Literature review and future research directions. Information Fusion, 97, 101804.
[CrossRef] [Google Scholar]
Chisom, O. N., Biu, P. W., Umoh, A. A., Obaedo, B. O., Adegbite, A. O., & Abatan, A. (2024). Reviewing the role of AI in environmental monitoring and conservation: A data-driven revolution for our planet. World Journal of Advanced Research and Reviews, 21(1), 161-171.
[CrossRef] [Google Scholar]
Oladoyinbo, T. O., Olabanji, S. O., Olaniyi, O. O., Adebiyi, O. O., Okunleye, O. J., & Alao, A. I. (2024). Exploring the challenges of artificial intelligence in data integrity and its influence on social dynamics. Asian Journal of Advanced Research and Reports, 18(2), 1-23.
[CrossRef] [Google Scholar]
Sontan, A. D., & Samuel, S. V. (2024). The intersection of artificial intelligence and cybersecurity: Challenges and opportunities. World Journal of Advanced Research and Reviews, 21(2), 1720-1736.
[CrossRef] [Google Scholar]
Familoni, B. T. (2024). Cybersecurity challenges in the age of AI: Theoretical approaches and practical solutions. Computer Science & IT Research Journal, 5(3), 703-724.
[CrossRef] [Google Scholar]
Khan, A., & Malik, K. M. (2023). Securing voice biometrics: One-shot learning approach for audio deepfake detection. In 2023 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-6). IEEE.
[CrossRef] [Google Scholar]
Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence, 53(4), 3974-4026.
[CrossRef] [Google Scholar]
Wang, C., Yi, J., Tao, J., Zhang, C., Zhang, S., Fu, R., & Chen, X. (2023).TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection. INTERSPEECH 2023, 3137-3141.
[CrossRef] [Google Scholar]
Yadav, A. K. S., Bartusiak, E. R., Bhagtani, K., & Delp, E. J. (2023). Synthetic speech attribution using self supervised audio spectrogram transformer. Electronic Imaging, 35(4), 372-1-372-11.
[CrossRef] [Google Scholar]
Hu, C., & Zhou, R. (2022). Synthetic voice spoofing detection based on online hard example mining. arXiv preprint arXiv:2209.11585.
[Google Scholar]
Zhang, Y., Lu, J., Shang, Z., Wang, W., & Zhang, P. (2024). Improving short utterance anti-spoofing with AASIST2. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 11636-11640). IEEE.
[CrossRef] [Google Scholar]
Pastor, E., Koudounas, A., Attanasio, G., Hovy, D., & Baralis, E. (2023). Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2221-2238.
[CrossRef] [Google Scholar]
Wang, X., Yamagishi, J., Todisco, M., Delgado, H., Nautsch, A., Evans, N., ... & Lee, K. A. (2020). ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech. Computer Speech & Language, 64, 101114.
[CrossRef] [Google Scholar]
Liu, T., & Yuan, X. (2023). Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1), 23.
[CrossRef] [Google Scholar]
Wang, C., Yi, J., Tao, J., Sun, H., Chen, X., Tian, Z., ... & Fu, R. (2022). Fully automated end-to-end fake audio detection. In Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia (pp. 27-33).
[CrossRef] [Google Scholar]
Conti, E., Salvi, D., Borrelli, C., Hosler, B., Bestagini, P., Antonacci, F., ... & Tubaro, S. (2022). Deepfake speech detection through emotion recognition: A semantic approach. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8962-8966). IEEE.
[CrossRef] [Google Scholar]
Zhang, L., Wang, X., Cooper, E., Evans, N., & Yamagishi, J. (2023). Range-based equal error rate for spoof localization. Interspeech 2023, 3212-3216.
[CrossRef] [Google Scholar]
Saha, S., Sahidullah, M., & Das, S. (2024). Exploring green AI for audio deepfake detection.In 2024 32nd European Signal Processing Conference (EUSIPCO), 186-190.
[CrossRef] [Google Scholar]
Crystal, D., & Quirk, R. (2021). Systems of prosodic and paralinguistic features in English. Walter de Gruyter GmbH & Co KG.
[CrossRef] [Google Scholar]
Bhavitha, B., Rodrigues, A. P., & Chiplunkar, N. N. (2017). Comparative study of machine learning techniques in sentimental analysis. In 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 216-221). IEEE.
[CrossRef] [Google Scholar]
Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022). Deepfake detection: A systematic literature review. IEEE Access, 10, 25494-25513.
[CrossRef] [Google Scholar]
Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), 52.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Ahmed, Z., Khan, G. S. A., & Vavekanand, R. (2025). AUDD: Audio Deepfake Detection Using Paralinguistic Feature Extraction Techniques. Journal of Computing Intelligence, 1(1), 3–8. https://doi.org/10.62762/JCI.2024.667518

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 1

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

Journal of Computing Intelligence

ISSN: request pending (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies