Document Type : Original Article
Authors
1 Faculty of Engineering & Technology, University of Mazandaran, Babolsar, Iran.
2 Faculty of Engineering & Technology, University of Mazandaran, Babolsar, Iran
Abstract
Violence Artificial Intelligence (AI) and Deep Learning (DL) systems present a difficult research area for identifying violence in videos within urban security frameworks and video surveillance systems. The proposed model divides violence detection tasks in video into two stages to achieve both rapid processing and precise outcomes. The LeNet-5 model operates at a speed of 0.8 frames per second to filter out non-violent videos during the first stage of operation. The second analysis stage employs the ResNet-50 model to inspect videos for potential violence when their probability surpasses 0.4. The Real-Life Violence dataset consisting of 1951 videos with 1000 violent and 951 non-violent videos was used for testing this system. The implementation produced 97.03% accuracy together with 95.70% recall and 98.46% precision and 97.06% F1-Score and AUC of 0.9902. Each frame requires only 20 milliseconds of processing time which allows real-time application of this system. A comparative analysis with existing methods, such as 3D-CNN, ViT, and YOLOv5+TSN, highlights the superiority of the proposed model in terms of both accuracy and speed. The system achieves better violence detection capabilities and operational reliability in real-world applications because it decreases detection errors.
Keywords