Volume 2, Issue 2, ICCK Transactions on Machine Intelligence
Volume 2, Issue 2, 2026
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Machine Intelligence, Volume 2, Issue 2, 2026: 100-105

Free to Read | Research Article | 01 March 2026
Intelligent Deepfake Detector Using Audio-Visual Clues
1 Mahatma Gandhi Institute of Technology, Hyderabad, Telangana, India
2 Malla Reddy Engineering College for Women, Hyderabad, Telangana, India
* Corresponding Author: Barnali Gupta Banik, [email protected]
ARK: ark:/57805/tmi.2025.601369
Received: 21 September 2025, Accepted: 20 January 2026, Published: 01 March 2026  
Abstract
Deepfake media is growing rapidly and causing significant harm. Bad actors now use AI to create fake videos that appear increasingly realistic. Traditional detection tools often fail because they analyze audio or visual signals in isolation. This paper introduces an intelligent Deepfake Detection system that addresses this limitation through a novel Multi-Modal Dispersion Framework. The system identifies subtle inconsistencies by tracking how lip movements align with speech patterns. By projecting these features into a shared latent space, the model quantifies the semantic divergence between modalities. A transformer module then captures cross-modal context to detect fine-grained manipulation artifacts. Evaluated on the DFDC and FakeAVCeleb datasets, the system achieves 94.3% accuracy, demonstrating strong potential for real-time deployment. This framework provides a reliable approach to media authentication and contributes to advancing AI safety.

Graphical Abstract
Intelligent Deepfake Detector Using Audio-Visual Clues

Keywords
deepfake detection
multi-modal dispersion
audio-visual clues
cross-modal inconsistency
lip-sync analysis
AI forensics
transformer fusion

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

AI Use Statement
The authors declare that no generative AI was used in the preparation of this manuscript.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Shahzad, S. A., Hashmi, A., Peng, Y. T., Tsao, Y., & Wang, H. M. (2025). AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Deepfake Detection of Frontal Face Videos. IEEE Transactions on Human-Machine Systems, 55(6), 973-982.
    [CrossRef]   [Google Scholar]
  2. Kharel, A., Paranjape, M., & Bera, A. (2023). DF-TransFusion: Multimodal deepfake detection via lip-audio cross-attention and facial self-attention. arXiv preprint arXiv:2309.06511.
    [Google Scholar]
  3. Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang, Y. G., & Li, S. N. (2022, June). M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 international conference on multimedia retrieval (pp. 615-623).
    [CrossRef]   [Google Scholar]
  4. Anshul, A., Gopal, S., Rajan, D., & Chng, E. S. (2025). Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 13826-13836).
    [Google Scholar]
  5. Javed, M., Zhang, Z., Dahri, F. H., Laghari, A. A., Krajčík, M., & Almadhor, A. (2025). Audio–Visual synchronization and lip movement analysis for Real-Time deepfake detection. International Journal of Computational Intelligence Systems, 18(1), 170.
    [CrossRef]   [Google Scholar]
  6. Nguyen-Le, H. H., Tran, V. T., Nguyen, D. T., & Le-Khac, N. A. (2024). Passive deepfake detection across multi-modalities: A comprehensive survey. arXiv preprint arXiv:2411.17911.
    [Google Scholar]
  7. Liu, P., Tao, Q., & Zhou, J. T. (2024). Evolving from single-modal to multi-modal facial deepfake detection: A survey. arXiv preprint arXiv:2406.06965.
    [Google Scholar]
  8. Salvi, D., Liu, H., Mandelli, S., Bestagini, P., Zhou, W., Zhang, W., & Tubaro, S. (2023). A robust approach to multimodal deepfake detection. Journal of Imaging, 9(6), 122.
    [CrossRef]   [Google Scholar]
  9. Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021, June). Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5037-5047). IEEE.
    [CrossRef]   [Google Scholar]
  10. Bekheet, A. A., Ghoneim, A., & Khoriba, G. (2024, July). A Comprehensive Comparative Analysis of Deepfake Detection Techniques in Visual, Audio, and Audio-Visual Domains. In 2024 Intelligent Methods, Systems, and Applications (IMSA) (pp. 122-129). IEEE.
    [CrossRef]   [Google Scholar]
  11. Yang, W., Zhou, X., Chen, Z., Guo, B., Ba, Z., Xia, Z., ... & Ren, K. (2023). Avoid-df: Audio-visual joint learning for detecting deepfake. IEEE Transactions on Information Forensics and Security, 18, 2015-2029.
    [CrossRef]   [Google Scholar]
  12. Cozzolino, D., Rössler, A., Thies, J., Nießner, M., & Verdoliva, L. (2021, October). ID-Reveal: Identity-aware DeepFake Video Detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 15088-15097). IEEE.
    [CrossRef]   [Google Scholar]
  13. Zhao, H., Wei, T., Zhou, W., Zhang, W., Chen, D., & Yu, N. (2021, June). Multi-attentional Deepfake Detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2185-2194). IEEE.
    [CrossRef]   [Google Scholar]
  14. Prajwal, K. R., Mukhopadhyay, R., Namboodiri, V. P., & Jawahar, C. V. (2020, October). A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM international conference on multimedia (pp. 484-492).
    [CrossRef]   [Google Scholar]
  15. Li, Y., & Lyu, S. (2018). Exposing DeepFake Videos By Detecting Face Warping Artifacts. arXiv preprint arXiv:1811.00656.
    [Google Scholar]
  16. Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward. Applied intelligence, 53(4), 3974-4026.
    [CrossRef]   [Google Scholar]
  17. Chintha, A., Thai, B., Sohrawardi, S. J., Bhatt, K., Hickerson, A., Wright, M., & Ptucha, R. (2020). Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE Journal of Selected Topics in Signal Processing, 14(5), 1024-1037.
    [CrossRef]   [Google Scholar]
  18. Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Niessner, M. (2019, October). FaceForensics++: Learning to Detect Manipulated Facial Images. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1-11). IEEE.
    [CrossRef]   [Google Scholar]
  19. Ilyas, H., Javed, A., & Malik, K. M. (2023). AVFakeNet: A unified end-to-end Dense Swin Transformer deep learning model for audio–visual deepfakes detection. Applied Soft Computing, 136, 110124.
    [CrossRef]   [Google Scholar]
  20. Khan, A. A., Laghari, A. A., Inam, S. A., Ullah, S., Shahzad, M., & Syed, D. (2025). A survey on multimedia-enabled deepfake detection: state-of-the-art tools and techniques, emerging trends, current challenges & limitations, and future directions. Discover Computing, 28(1), 48.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Banik, B. G., & Naziya, S. N. (2026). Smart Deepfake Detector Using Audio-Visual Clues. ICCK Transactions on Machine Intelligence, 2(2), 100–105. https://doi.org/10.62762/TMI.2025.601369
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Banik, Barnali Gupta
AU  - Naziya, Shaik Nidha
PY  - 2026
DA  - 2026/03/01
TI  - Intelligent Deepfake Detector Using Audio-Visual Clues
JO  - ICCK Transactions on Machine Intelligence
T2  - ICCK Transactions on Machine Intelligence
JF  - ICCK Transactions on Machine Intelligence
VL  - 2
IS  - 2
SP  - 100
EP  - 105
DO  - 10.62762/TMI.2025.601369
UR  - https://www.icck.org/article/abs/TMI.2025.601369
KW  - deepfake detection
KW  - multi-modal dispersion
KW  - audio-visual clues
KW  - cross-modal inconsistency
KW  - lip-sync analysis
KW  - AI forensics
KW  - transformer fusion
AB  - Deepfake media is growing rapidly and causing significant harm. Bad actors now use AI to create fake videos that appear increasingly realistic. Traditional detection tools often fail because they analyze audio or visual signals in isolation. This paper introduces an intelligent Deepfake Detection system that addresses this limitation through a novel Multi-Modal Dispersion Framework. The system identifies subtle inconsistencies by tracking how lip movements align with speech patterns. By projecting these features into a shared latent space, the model quantifies the semantic divergence between modalities. A transformer module then captures cross-modal context to detect fine-grained manipulation artifacts. Evaluated on the DFDC and FakeAVCeleb datasets, the system achieves 94.3% accuracy, demonstrating strong potential for real-time deployment. This framework provides a reliable approach to media authentication and contributes to advancing AI safety.
SN  - 3068-7403
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Banik2026Intelligen,
  author = {Barnali Gupta Banik and Shaik Nidha Naziya},
  title = {Intelligent Deepfake Detector Using Audio-Visual Clues},
  journal = {ICCK Transactions on Machine Intelligence},
  year = {2026},
  volume = {2},
  number = {2},
  pages = {100-105},
  doi = {10.62762/TMI.2025.601369},
  url = {https://www.icck.org/article/abs/TMI.2025.601369},
  abstract = {Deepfake media is growing rapidly and causing significant harm. Bad actors now use AI to create fake videos that appear increasingly realistic. Traditional detection tools often fail because they analyze audio or visual signals in isolation. This paper introduces an intelligent Deepfake Detection system that addresses this limitation through a novel Multi-Modal Dispersion Framework. The system identifies subtle inconsistencies by tracking how lip movements align with speech patterns. By projecting these features into a shared latent space, the model quantifies the semantic divergence between modalities. A transformer module then captures cross-modal context to detect fine-grained manipulation artifacts. Evaluated on the DFDC and FakeAVCeleb datasets, the system achieves 94.3\% accuracy, demonstrating strong potential for real-time deployment. This framework provides a reliable approach to media authentication and contributes to advancing AI safety.},
  keywords = {deepfake detection, multi-modal dispersion, audio-visual clues, cross-modal inconsistency, lip-sync analysis, AI forensics, transformer fusion},
  issn = {3068-7403},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 28
PDF Downloads: 6

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Machine Intelligence

ICCK Transactions on Machine Intelligence

ISSN: 3068-7403 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/