Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model

Altaf Hussain

doi:10.62762/JIAP.2025.839123

CiteScore

Impact Factor

Volume 1, Issue 3, ICCK Journal of Image Analysis and Processing

Volume 1, Issue 3, 2025

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Enhancing Fake News Detection with a Hybrid NLP-Machine Learning Framework Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems

ICCK Journal of Image Analysis and Processing, Volume 1, Issue 3, 2025: 125-146

Open Access | Research Article | 21 September 2025

Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model

Altaf Hussain 1 *

1 School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

* Corresponding Author: Altaf Hussain, [email protected]

DOI: 10.62762/JIAP.2025.839123

Received: 31 August 2025, Accepted: 18 September 2025, Published: 21 September 2025

PDF (3.49 MB)

Article Metrics Cite This Article

Abstract

Real-time detection of violent behavior through surveillance technologies is increasingly important for public safety. This study tackles the challenge of automatically distinguishing violent from non-violent activities in continuous video streams. Traditional surveillance depends on human monitoring, which is time-consuming and error-prone, highlighting the need for intelligent systems that detect abnormal behaviors accurately with low computational cost. A key difficulty lies in the ambiguity of defining violent actions and the reliance on large annotated datasets, which are costly to produce. Many existing approaches also demand high computational resources, limiting real-time deployment on resource-constrained devices. To overcome these issues, the present work employs the lightweight MobileNet deep learning architecture for violence detection in surveillance videos. MobileNet is well-suited for embedded devices such as Raspberry Pi and Jetson Nano while maintaining competitive accuracy. In Python-based simulations on the Hockey Fight dataset, MobileNet is compared with AlexNet, VGG-16, and GoogleNet. Results show that MobileNet achieved 96.66% accuracy with a loss of 0.1329, outperforming the other models in both accuracy and efficiency. These findings demonstrate MobileNet’s superior balance of precision, computational cost, and real-time feasibility, offering a robust framework for intelligent surveillance in public safety monitoring, crowd management, and anomaly detection.

Graphical Abstract

Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model

Keywords

real-time violence detection

CCTV surveillance video

convolutional neural networks

VGG-16

GoogLeNet

AlexNet

MobileNet

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The author declares no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Azfar, T., Li, J., Yu, H., Cheu, R. L., Lv, Y., & Ke, R. (2024). Deep learning-based computer vision methods for complex traffic environments perception: A review. Data Science for Transportation, 6(1), 1.
[CrossRef] [Google Scholar]
Afza, F., Khan, M. A., Sharif, M., Kadry, S., Manogaran, G., Saba, T., ... & Damaševičius, R. (2021). A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image and Vision Computing, 106, 104090.
[CrossRef] [Google Scholar]
Ezz, S., Hassan, N. M. H., Othman, A. M., Monier, A., & Ehab, A. (2025). Urban Road Defect Detection: A Hybrid EfficientNetV2-B0 and CBAM Framework with Real-Time Computer Vision Optimization.
[CrossRef] [Google Scholar]
Ha, J., Park, J., Kim, H., Park, H., & Paik, J. (2018, January). Violence detection for video surveillance system using irregular motion information. In 2018 International Conference on Electronics, Information, and Communication (ICEIC) (pp. 1-3). IEEE.
[CrossRef] [Google Scholar]
Halder, R., & Chatterjee, R. (2020). CNN-BiLSTM model for violence detection in smart surveillance. SN Computer science, 1(4), 201.
[CrossRef] [Google Scholar]
Hu, J., Liao, X., Wang, W., & Qin, Z. (2022). Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3), 1089–1102.
[CrossRef] [Google Scholar]
Jalal, A., Mahmood, M., & Hasan, A. S. (2019). Multi-features descriptors for human activity tracking and recognition in Indoor-outdoor environments. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) (pp. 371–376).
[CrossRef] [Google Scholar]
Jeeva, S., & Sivabalakrishnan, M. (2019). Twin background model for foreground detection in video sequence. Cluster Computing, 22(Suppl 5), 11659-11668.
[CrossRef] [Google Scholar]
Juba, B., & Le, H. S. (2019, July). Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 4039-4048).
[CrossRef] [Google Scholar]
Menghani, G. (2023). Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys, 55(12), 1-37.
[CrossRef] [Google Scholar]
Kiran, S., Khan, M. A., Javed, M. Y., Alhaisoni, M., Tariq, U., Nam, Y., ... & Sharif, M. (2021). Multi-Layered Deep Learning Features Fusion for Human Action Recognition. Computers, Materials and Continua, 69(3), 4061-4075.
[CrossRef] [Google Scholar]
Pang, Y. N., Liu, B., Liu, J., Wan, S. P., Wu, T., Yuan, J., ... & Wu, Q. (2022). Singlemode-multimode-singlemode optical fiber sensor for accurate blood pressure monitoring. Journal of Lightwave Technology, 40(13), 4443-4450.
[CrossRef] [Google Scholar]
Wang, T., Jin, T., Lin, W., Lin, Y., Liu, H., Yue, T., ... & Lee, C. (2024). Multimodal sensors enabled autonomous soft robotic system with self-adaptive manipulation. ACS nano, 18(14), 9980-9996.
[CrossRef] [Google Scholar]
Mateos, P., & Bellogín, A. (2024). A systematic literature review of recent advances on context-aware recommender systems. Artificial Intelligence Review, 58(1), 20.
[CrossRef] [Google Scholar]
Liao, X., Li, K., Zhu, X., & Liu, K. R. (2020). Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing, 14(5), 955-968.
[CrossRef] [Google Scholar]
Ranasinghe, S., Al Machot, F., & Mayr, H. C. (2016). A review on applications of activity recognition systems with regard to performance and evaluation. International Journal of Distributed Sensor Networks, 12(8), 1550147716665520.
[CrossRef] [Google Scholar]
Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.
[CrossRef] [Google Scholar]
Muhammad, K., Khan, S., Palade, V., Mehmood, I., & De Albuquerque, V. H. C. (2019). Edge intelligence-assisted smoke detection in foggy surveillance environments. IEEE Transactions on Industrial Informatics, 16(2), 1067–1075.
[CrossRef] [Google Scholar]
Nweke, H. F., Teh, Y. W., Mujtaba, G., & Al-Garadi, M. A. (2019). Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion, 46, 147–170.
[CrossRef] [Google Scholar]
Diraco, G., Rescio, G., Siciliano, P., & Leone, A. (2023). Review on human action recognition in smart living: Sensing technology, multimodality, real-time processing, interoperability, and resource-constrained processing. Sensors, 23(11), 5281.
[CrossRef] [Google Scholar]
Pansuriya, P., Chokshi, N., Patel, D., & Vahora, S. (2020). Human activity recognition with event-based dynamic vision sensor using deep recurrent neural network. International Journal of Advanced Science and Technology, 29(4), 9084–9091.
[Google Scholar]
Sezer, S., & Surer, E. (2019). Information augmentation for human activity recognition and fall detection using empirical mode decomposition on smartphone data. In Proceedings of the 6th International Conference on Movement and Computing (pp. 1–8).
[CrossRef] [Google Scholar]
Siddiqi, M. H., Alruwaili, M., & Ali, A. (2019). A novel feature selection method for video-based human activity recognition systems. IEEE Access, 7, 119593–119602.
[CrossRef] [Google Scholar]
Singh, T., & Vishwakarma, D. K. (2018). Human activity recognition in video benchmarks: A survey. Advances in Signal Processing and Communication: Select Proceedings of ICSC 2018, 247-259.
[CrossRef] [Google Scholar]
Singh, R., Kushwaha, A. K. S., & Srivastava, R. (2019). Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools and Applications, 78(12), 17165-17196.
[CrossRef] [Google Scholar]
Sobral, A., & Vacavant, A. (2014). A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Computer Vision and Image Understanding, 122, 4–21.
[CrossRef] [Google Scholar]
Subedar, M., Krishnan, R., Meyer, P. L., Tickoo, O., & Huang, J. (2019, October). Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6300-6309). IEEE.
[CrossRef] [Google Scholar]
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., & Baik, S. W. (2017). Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access, 6, 1155–1166.
[CrossRef] [Google Scholar]
Ullah, A., Muhammad, K., Haq, I. U., & Baik, S. W. (2019). Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Generation Computer Systems, 96, 386–397.
[CrossRef] [Google Scholar]
Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2021). CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia tools and applications, 80(11), 16979-16995.
[CrossRef] [Google Scholar]
Voicu, R.-A., Dobre, C., Bajenaru, L., & Ciobanu, R.-I. (2019). Human physical activity recognition using smartphone sensors. Sensors, 19(3), 458.
[CrossRef] [Google Scholar]
Žemgulys, J., Raudonis, V., Maskeliūnas, R., & Damaševičius, R. (2020). Recognition of basketball referee signals from real-time videos. Journal of Ambient Intelligence and Humanized Computing, 11(3), 979-991.
[CrossRef] [Google Scholar]
Chen, Y., Li, J., Blasch, E., & Qu, Q. (2025). Future Outdoor Safety Monitoring: Integrating Human Activity Recognition with the Internet of Physical–Virtual Things. Applied Sciences, 15(7), 3434.
[CrossRef] [Google Scholar]
Zhu, J., Chen, H., & Ye, W. (2020). Classification of human activities based on radar signals using 1D-CNN and LSTM. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1–5).
[CrossRef] [Google Scholar]
Zhuang, Z., & Xue, Y. (2019). Sport-related human activity detection and recognition using a smartwatch. Sensors, 19(22), 5001.
[CrossRef] [Google Scholar]
Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). WiFi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0–0).
[CrossRef] [Google Scholar]
Mahum, R., Irtaza, A., Nawaz, M., Nazir, T., Masood, M., Shaikh, S., & Nasr, E. A. (2023). A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network. Multimedia Tools and Applications, 82(9), 13811–13835.
[CrossRef] [Google Scholar]
Akhtar, M. J., Mahum, R., Butt, F. S., Amin, R., El-Sherbeeny, A. M., Lee, S. M., & Shaikh, S. (2022). A robust framework for object detection in a traffic surveillance system. Electronics, 11(21), 3425.
[CrossRef] [Google Scholar]
Mahum, R., Irtaza, A. M. A., Masood, M., Nawaz, M., & Nazir, T. (2021). Real-time object detection and classification in surveillance videos using hybrid deep learning model. In Proceedings of the 6th Multi Disciplinary Student Research International Conference (MDSRIC), Wah, Pakistan (Vol. 30).
[Google Scholar]
Miao, F., Huang, Y., Lu, Z., Ohtsuki, T., Gui, G., & Sari, H. (2025). Wi-Fi sensing techniques for human activity recognition: Brief survey, potential challenges, and research directions. ACM Computing Surveys, 57(5), 1–30.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Hussain, A. (2025). Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model. ICCK Journal of Image Analysis and Processing, 1(3), 125–146. https://doi.org/10.62762/JIAP.2025.839123

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 16

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Journal of Image Analysis and Processing

ISSN: 3068-6679 (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies