Advanced Deep Learning Approaches for Accurate and Efficient Suspicious Behavior Detection in Surveillance Videos

Safdel, َArash; Ghasemi, Jamal; Zendehbad, Seyyed Ali

doi:10.22124/cse.2025.30210.1099

Document Type : Original Article

Authors

Faculty of Engineering & Technology, University of Mazandaran, Babolsar, Iran

https://doi.org/10.22124/cse.2025.30210.1099

Abstract

Violence Artificial Intelligence (AI) and Deep Learning (DL) systems present a difficult research area for identifying violence in videos within urban security frameworks and video surveillance systems. The proposed model divides violence detection tasks in video into two stages to achieve both rapid processing and precise outcomes. The LeNet-5 model operates at a speed of 0.8 frames per second to filter out non-violent videos during the first stage of operation. The second analysis stage employs the ResNet-50 model to inspect videos for potential violence when their probability surpasses 0.4. The Real-Life Violence dataset consisting of 1951 videos with 1000 violent and 951 non-violent videos was used for testing this system. The implementation produced 97.03% accuracy together with 95.70% recall and 98.46% precision and 97.06% F1-Score and AUC of 0.9902. Each frame requires only 20 milliseconds of processing time which allows real-time application of this system. A comparative analysis with existing methods, such as 3D-CNN, ViT, and YOLOv5+TSN, highlights the superiority of the proposed model in terms of both accuracy and speed. The system achieves better violence detection capabilities and operational reliability in real-world applications because it decreases detection errors.

Keywords

References

[1] O. Elharrouss, N. Almaadeed, and S. Al-Maadeed, "A review of video surveillance systems," Journal of Visual Communication and Image Representation, vol. 77, p. 103116, 2021.

[2] D. K. Jain, X. Zhao, C. Gan, P. K. Shukla, A. Jain, and S. Sharma, "Fusion-driven deep feature network for enhanced object detection and tracking in video surveillance systems," Information Fusion, vol. 109, p. 102429, 2024.

[3] R. Asghari, S. Ghasemzadeh, and M. Allahyari, "Anomaly Detection System in the Industrial Internet of Things Network with Convolutional Neural Network," Computational Sciences and Engineering, vol. 3, no. 2, pp. 253-263, 2024.

[4] S. Irene, A. J. Prakash, and V. R. Uthariaraj, "Person search over security video surveillance systems using deep learning methods: A review," Image and Vision Computing, vol. 143, p. 104930, 2024.

[5] T. D. Räty, "Survey on contemporary remote surveillance systems for public safety," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 40, no. 5, pp. 493-515, 2010.

[6] J. Usha Rani and P. Raviraj, "Real-time human detection for intelligent video surveillance: an empirical research and in-depth review of its applications," SN Computer Science, vol. 4, no. 3, p. 258, 2023.

[7] F. U. M. Ullah, M. S. Obaidat, A. Ullah, K. Muhammad, M. Hijji, and S. W. Baik, "A comprehensive review on vision-based violence detection in surveillance videos," ACM Computing Surveys, vol. 55, no. 10, pp. 1-44, 2023.

[8] J. Zhang, X. Yu, X. Lei, and C. Wu, "A novel deep LeNet-5 convolutional neural network model for image recognition," Computer Science and Information Systems, vol. 19, no. 3, pp. 1463-1480, 2022.

[9] B. Koonce and B. Koonce, "ResNet 50," Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization, pp. 63-72, 2021.

[10] P. Negre, R. S. Alonso, A. González-Briones, J. Prieto, and S. Rodríguez-González, "Literature Review of Deep-Learning-based detection of violence in video," Sensors, vol. 24, no. 12, p. 4016, 2024.

[11] A. Kosari, "Real-Time Network Traffic Anomaly Detection Using Spiking Neural Networks (SNNs) with Adaptive Learning," Contributions of Science and Technology for Engineering, 2025.

[12] M. G. Morshed, T. Sultana, A. Alam, and Y.-K. Lee, "Human action recognition: A taxonomy-based survey, updates, and opportunities," Sensors, vol. 23, no. 4, p. 2182, 2023.

[13] I. Mostafa, K. H El-Safty, M. Gamal, and R. Abdel-Kader, "Abnormal Human Activity Recognition in Video Surveillance: A Survey," Port-Said Engineering Research Journal, 2024.

[14] S. Singh, S. Dewangan, G. S. Krishna, V. Tyagi, S. Reddy, and P. R. Medi, "Video vision transformers for violence detection," arXiv preprint arXiv:2209.03561, 2022.

[15] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.

[16] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.

[17] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.

[18] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

[19] W. Song, D. Zhang, X. Zhao, J. Yu, R. Zheng, and A. Wang, "A novel violent video detection scheme based on modified 3d convolutional neural networks," IEEE Access, vol. 7, pp. 39172-39179, 2019.

[20] I. M. Abundez, R. Alejo, F. Primero Primero, E. E. Granda-Gutiérrez, O. Portillo-Rodríguez, and J. A. Antonio Velázquez, "Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models," Algorithms, vol. 17, no. 7, p. 316, 2024.

[21] S. Vosta and K.-C. Yow, "KianNet: A violence detection model using an attention-based CNN-LSTM structure," IEEE Access, vol. 12, pp. 2198-2209, 2023.

[22] Y.-S. Tu, Y.-S. Shen, Y. Y. Chan, L. Wang, and J. Chen, "Violent video recognition by using sequential image collage," Sensors, vol. 24, no. 6, p. 1844, 2024.

[23] F. J. Rendón-Segador, J. A. Álvarez-García, J. L. Salazar-González, and T. Tommasi, "Crimenet: Neural structured learning using vision transformer for violence detection," Neural networks, vol. 161, pp. 318-329, 2023.

[24] V. Gautam, H. Maheshwari, R. G. Tiwari, A. K. Agarwal, and N. K. Trivedi, "Automated Detection of Violence in Detached Areas using Hybrid Deep Learning Models: A YOLO-5 and CNN Approach," in 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS), 2023: IEEE, pp. 1276-1282.

[25] P. Turyahabwa and S. Murindanyi, "Integrative Review of Human Activity Recognition and Violence Detection: Exploring Techniques, Modalities, and Cross-Domain Knowledge Transfer," Journal of Data Science and Intelligent Systems, 2025.

[26] M. M. Soliman, M. H. Kamal, M. A. E.-M. Nashed, Y. M. Mostafa, B. S. Chawky, and D. Khattab, "Violence recognition from videos using deep learning techniques," in 2019 ninth international conference on intelligent computing and information systems (ICICIS), 2019: IEEE, pp. 80-85.

[27] W. Cui, Q. Lu, A. M. Qureshi, W. Li, and K. Wu, "An adaptive LeNet-5 model for anomaly detection," Information Security Journal: A Global Perspective, vol. 30, no. 1, pp. 19-29, 2021.

[28] S. Manjula and K. Lakshmi, "Human abnormal activity pattern analysis in diverse background surveillance videos using SVM and ResNet50 model," in IoT and Analytics for Sensor Networks: Proceedings of ICWSNUCA 2021, 2022: Springer, pp. 47-60.

[29] S. A. Zendehbad, H. R. Kobravi, M. M. Khalilzadeh, A. S. Razavi, and P. S. Nezhad, "Identifying the Arm Joint Dynamics Using Muscle Synergy Patterns and SVMD-BiGRU Hybrid Mechanism," Frontiers in Biomedical Technologies, 2024.

[30] A. Rezai and M. Aghazadenejat, "Multiple Sclerosis Diagnosis Methods Using Machine Learning and Imaging Techniques," Computational Sciences and Engineering, 2024.

[31] A. Fahmi Jafargholkhanloo, M. Shamsi, and M. Bashiri Bawil, "Robust Gustafson-Kessel (RGK) Clustering for Segmentation of Brain Tissues Based on MRI images," Computational Sciences and Engineering, 2025.

[32] Z. Mehdipour, "Optimal Number and Locations of Controllers in Two-Dimensional Frames Using Genetic Algorithm," Contributions of Science and Technology for Engineering, vol. 1, no. 2, pp. 31-43, 2024.

[33] F. Tavakoli and J. Ghasemi, "Brain MRI segmentation by combining different MRI modalities using Dempster–Shafer theory," IET Image Processing, vol. 12, no. 8, pp. 1322-1330, 2018.

[34] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.

[35] M. Mohammadi, M. Chubin, and H. Aghapanah Roudsari, "Comparing Classic Machine Learning with Deep Learning for Stress Detection Using Wearable Sensors," Computational Sciences and Engineering, vol. 3, no. 2, pp. 189-199, 2023.

[36] A. Moazzami Gudarzi and H. A. Ozgoli, "Optimal Selection and Efficient Utilization of Particle Swarm Optimization Methods for Designing Renewable Energy Microgrids," Contributions of Science and Technology for Engineering, vol. 1, no. 2, pp. 20-30, 2024.

[37] F. Bahadoran and J. Ghasemi, "A Framework for Alzheimer’s Diagnosis Using Dempster-Shafer Theory and Multimodal MRI Fusion of White and Gray Matter," Contributions of Science and Technology for Engineering, 2025.

[38] M. Imani, "Convolutional Neural Networks with Different Dimensions for PolSAR Image Classification," Computational Sciences and Engineering, vol. 2, no. 1, pp. 69-79, 2022.

Advanced Deep Learning Approaches for Accurate and Efficient Suspicious Behavior Detection in Surveillance Videos

References

References

Volume 4, Issue 2September 2024Pages 203-216

Volume 4, Issue 2
September 2024
Pages 203-216