-
CiteScore
-
Impact Factor
Volume 1, Issue 3, ICCK Journal of Image Analysis and Processing
Volume 1, Issue 3, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Journal of Image Analysis and Processing, Volume 1, Issue 3, 2025: 125-146

Open Access | Research Article | 21 September 2025
Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model
1 School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
* Corresponding Author: Altaf Hussain, [email protected]
Received: 31 August 2025, Accepted: 18 September 2025, Published: 21 September 2025  
Abstract
Real-time detection of violent behavior through surveillance technologies is increasingly important for public safety. This study tackles the challenge of automatically distinguishing violent from non-violent activities in continuous video streams. Traditional surveillance depends on human monitoring, which is time-consuming and error-prone, highlighting the need for intelligent systems that detect abnormal behaviors accurately with low computational cost. A key difficulty lies in the ambiguity of defining violent actions and the reliance on large annotated datasets, which are costly to produce. Many existing approaches also demand high computational resources, limiting real-time deployment on resource-constrained devices. To overcome these issues, the present work employs the lightweight MobileNet deep learning architecture for violence detection in surveillance videos. MobileNet is well-suited for embedded devices such as Raspberry Pi and Jetson Nano while maintaining competitive accuracy. In Python-based simulations on the Hockey Fight dataset, MobileNet is compared with AlexNet, VGG-16, and GoogleNet. Results show that MobileNet achieved 96.66% accuracy with a loss of 0.1329, outperforming the other models in both accuracy and efficiency. These findings demonstrate MobileNet’s superior balance of precision, computational cost, and real-time feasibility, offering a robust framework for intelligent surveillance in public safety monitoring, crowd management, and anomaly detection.

Graphical Abstract
Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model

Keywords
real-time violence detection
CCTV surveillance video
convolutional neural networks
VGG-16
GoogLeNet
AlexNet
MobileNet

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The author declares no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Azfar, T., Li, J., Yu, H., Cheu, R. L., Lv, Y., & Ke, R. (2024). Deep learning-based computer vision methods for complex traffic environments perception: A review. Data Science for Transportation, 6(1), 1.
    [CrossRef]   [Google Scholar]
  2. Afza, F., Khan, M. A., Sharif, M., Kadry, S., Manogaran, G., Saba, T., ... & Damaševičius, R. (2021). A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image and Vision Computing, 106, 104090.
    [CrossRef]   [Google Scholar]
  3. Ezz, S., Hassan, N. M. H., Othman, A. M., Monier, A., & Ehab, A. (2025). Urban Road Defect Detection: A Hybrid EfficientNetV2-B0 and CBAM Framework with Real-Time Computer Vision Optimization.
    [CrossRef]   [Google Scholar]
  4. Ha, J., Park, J., Kim, H., Park, H., & Paik, J. (2018, January). Violence detection for video surveillance system using irregular motion information. In 2018 International Conference on Electronics, Information, and Communication (ICEIC) (pp. 1-3). IEEE.
    [CrossRef]   [Google Scholar]
  5. Halder, R., & Chatterjee, R. (2020). CNN-BiLSTM model for violence detection in smart surveillance. SN Computer science, 1(4), 201.
    [CrossRef]   [Google Scholar]
  6. Hu, J., Liao, X., Wang, W., & Qin, Z. (2022). Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3), 1089–1102.
    [CrossRef]   [Google Scholar]
  7. Jalal, A., Mahmood, M., & Hasan, A. S. (2019). Multi-features descriptors for human activity tracking and recognition in Indoor-outdoor environments. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) (pp. 371–376).
    [CrossRef]   [Google Scholar]
  8. Jeeva, S., & Sivabalakrishnan, M. (2019). Twin background model for foreground detection in video sequence. Cluster Computing, 22(Suppl 5), 11659-11668.
    [CrossRef]   [Google Scholar]
  9. Juba, B., & Le, H. S. (2019, July). Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 4039-4048).
    [CrossRef]   [Google Scholar]
  10. Menghani, G. (2023). Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys, 55(12), 1-37.
    [CrossRef]   [Google Scholar]
  11. Kiran, S., Khan, M. A., Javed, M. Y., Alhaisoni, M., Tariq, U., Nam, Y., ... & Sharif, M. (2021). Multi-Layered Deep Learning Features Fusion for Human Action Recognition. Computers, Materials and Continua, 69(3), 4061-4075.
    [CrossRef]   [Google Scholar]
  12. Pang, Y. N., Liu, B., Liu, J., Wan, S. P., Wu, T., Yuan, J., ... & Wu, Q. (2022). Singlemode-multimode-singlemode optical fiber sensor for accurate blood pressure monitoring. Journal of Lightwave Technology, 40(13), 4443-4450.
    [CrossRef]   [Google Scholar]
  13. Wang, T., Jin, T., Lin, W., Lin, Y., Liu, H., Yue, T., ... & Lee, C. (2024). Multimodal sensors enabled autonomous soft robotic system with self-adaptive manipulation. ACS nano, 18(14), 9980-9996.
    [CrossRef]   [Google Scholar]
  14. Mateos, P., & Bellogín, A. (2024). A systematic literature review of recent advances on context-aware recommender systems. Artificial Intelligence Review, 58(1), 20.
    [CrossRef]   [Google Scholar]
  15. Liao, X., Li, K., Zhu, X., & Liu, K. R. (2020). Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing, 14(5), 955-968.
    [CrossRef]   [Google Scholar]
  16. Ranasinghe, S., Al Machot, F., & Mayr, H. C. (2016). A review on applications of activity recognition systems with regard to performance and evaluation. International Journal of Distributed Sensor Networks, 12(8), 1550147716665520.
    [CrossRef]   [Google Scholar]
  17. Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.
    [CrossRef]   [Google Scholar]
  18. Muhammad, K., Khan, S., Palade, V., Mehmood, I., & De Albuquerque, V. H. C. (2019). Edge intelligence-assisted smoke detection in foggy surveillance environments. IEEE Transactions on Industrial Informatics, 16(2), 1067–1075.
    [CrossRef]   [Google Scholar]
  19. Nweke, H. F., Teh, Y. W., Mujtaba, G., & Al-Garadi, M. A. (2019). Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion, 46, 147–170.
    [CrossRef]   [Google Scholar]
  20. Diraco, G., Rescio, G., Siciliano, P., & Leone, A. (2023). Review on human action recognition in smart living: Sensing technology, multimodality, real-time processing, interoperability, and resource-constrained processing. Sensors, 23(11), 5281.
    [CrossRef]   [Google Scholar]
  21. Pansuriya, P., Chokshi, N., Patel, D., & Vahora, S. (2020). Human activity recognition with event-based dynamic vision sensor using deep recurrent neural network. International Journal of Advanced Science and Technology, 29(4), 9084–9091.
    [Google Scholar]
  22. Sezer, S., & Surer, E. (2019). Information augmentation for human activity recognition and fall detection using empirical mode decomposition on smartphone data. In Proceedings of the 6th International Conference on Movement and Computing (pp. 1–8).
    [CrossRef]   [Google Scholar]
  23. Siddiqi, M. H., Alruwaili, M., & Ali, A. (2019). A novel feature selection method for video-based human activity recognition systems. IEEE Access, 7, 119593–119602.
    [CrossRef]   [Google Scholar]
  24. Singh, T., & Vishwakarma, D. K. (2018). Human activity recognition in video benchmarks: A survey. Advances in Signal Processing and Communication: Select Proceedings of ICSC 2018, 247-259.
    [CrossRef]   [Google Scholar]
  25. Singh, R., Kushwaha, A. K. S., & Srivastava, R. (2019). Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools and Applications, 78(12), 17165-17196.
    [CrossRef]   [Google Scholar]
  26. Sobral, A., & Vacavant, A. (2014). A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Computer Vision and Image Understanding, 122, 4–21.
    [CrossRef]   [Google Scholar]
  27. Subedar, M., Krishnan, R., Meyer, P. L., Tickoo, O., & Huang, J. (2019, October). Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6300-6309). IEEE.
    [CrossRef]   [Google Scholar]
  28. Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., & Baik, S. W. (2017). Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access, 6, 1155–1166.
    [CrossRef]   [Google Scholar]
  29. Ullah, A., Muhammad, K., Haq, I. U., & Baik, S. W. (2019). Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Generation Computer Systems, 96, 386–397.
    [CrossRef]   [Google Scholar]
  30. Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2021). CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia tools and applications, 80(11), 16979-16995.
    [CrossRef]   [Google Scholar]
  31. Voicu, R.-A., Dobre, C., Bajenaru, L., & Ciobanu, R.-I. (2019). Human physical activity recognition using smartphone sensors. Sensors, 19(3), 458.
    [CrossRef]   [Google Scholar]
  32. Žemgulys, J., Raudonis, V., Maskeliūnas, R., & Damaševičius, R. (2020). Recognition of basketball referee signals from real-time videos. Journal of Ambient Intelligence and Humanized Computing, 11(3), 979-991.
    [CrossRef]   [Google Scholar]
  33. Chen, Y., Li, J., Blasch, E., & Qu, Q. (2025). Future Outdoor Safety Monitoring: Integrating Human Activity Recognition with the Internet of Physical–Virtual Things. Applied Sciences, 15(7), 3434.
    [CrossRef]   [Google Scholar]
  34. Zhu, J., Chen, H., & Ye, W. (2020). Classification of human activities based on radar signals using 1D-CNN and LSTM. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1–5).
    [CrossRef]   [Google Scholar]
  35. Zhuang, Z., & Xue, Y. (2019). Sport-related human activity detection and recognition using a smartwatch. Sensors, 19(22), 5001.
    [CrossRef]   [Google Scholar]
  36. Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). WiFi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0–0).
    [CrossRef]   [Google Scholar]
  37. Mahum, R., Irtaza, A., Nawaz, M., Nazir, T., Masood, M., Shaikh, S., & Nasr, E. A. (2023). A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network. Multimedia Tools and Applications, 82(9), 13811–13835.
    [CrossRef]   [Google Scholar]
  38. Akhtar, M. J., Mahum, R., Butt, F. S., Amin, R., El-Sherbeeny, A. M., Lee, S. M., & Shaikh, S. (2022). A robust framework for object detection in a traffic surveillance system. Electronics, 11(21), 3425.
    [CrossRef]   [Google Scholar]
  39. Mahum, R., Irtaza, A. M. A., Masood, M., Nawaz, M., & Nazir, T. (2021). Real-time object detection and classification in surveillance videos using hybrid deep learning model. In Proceedings of the 6th Multi Disciplinary Student Research International Conference (MDSRIC), Wah, Pakistan (Vol. 30).
    [Google Scholar]
  40. Miao, F., Huang, Y., Lu, Z., Ohtsuki, T., Gui, G., & Sari, H. (2025). Wi-Fi sensing techniques for human activity recognition: Brief survey, potential challenges, and research directions. ACM Computing Surveys, 57(5), 1–30.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Hussain, A. (2025). Detection and Recognition of Real-Time Violence and Human Actions Recognition in Surveillance using Lightweight MobileNet Model. ICCK Journal of Image Analysis and Processing, 1(3), 125–146. https://doi.org/10.62762/JIAP.2025.839123

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 86
PDF Downloads: 16

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
ICCK Journal of Image Analysis and Processing

ICCK Journal of Image Analysis and Processing

ISSN: 3068-6679 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/