-
CiteScore
-
Impact Factor
Volume 2, Issue 4, ICCK Transactions on Sensing, Communication, and Control
Volume 2, Issue 4, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Sensing, Communication, and Control, Volume 2, Issue 4, 2025: 276-289

Free to Read | Research Article | 30 December 2025
Dual-Pathway Sensing with Optimized Attention Network for Video Summarization in Surveillance Systems
1 Department of IT, Saudi Media Systems, Riyadh 11482, Saudi Arabia
2 Department of Electrical and Computer Engineering, Villanova University, Villanova, PA 19085, United States
3 Department of Software and Artificial Intelligence, Gachon University, Seongnam 13120, South Korea
4 Department of Computer Science, Govt Degree College Lal Qilla Maidan Dir Lower, Pakistan
* Corresponding Author: Bilal Ahmad, [email protected]
ARK: ark:/57805/tscc.2025.308540
Received: 20 October 2025, Accepted: 05 December 2025, Published: 30 December 2025  
Abstract
Video summarization (VS) aims to generate concise representations of long videos by extracting the most informative frames while maintaining essential content. Existing methods struggle to capture multi-scale dependencies and often rely on suboptimal feature representations, limiting their ability to model complex inter-frame relationships. To address these issues, we propose a multi-scale sensing network that incorporates three key innovations to improve VS. First, we introduce multi-scale dilated convolution blocks with progressively increasing dilation rates to capture temporal context at multiple levels, enabling the network to understand both local transitions and long-range dependencies. Second, we develop a Dual-Pathway Efficient Channel Attention (DECA) module that leverages statistics from Global Average Pooling and Global Max Pooling pathways. Third, we suggest an Optimized Spatial Attention (OSA) module that replaces standard $7\times7$ convolutions with more efficient operations while maintaining spatial dependency modeling. The proposed framework uses EfficientNetB7 as the backbone for robust spatial feature extraction, followed by multi-scale dilated blocks and dual attention mechanisms for detailed feature refinement. Extensive tests on the TVSum and SumMe benchmark datasets demonstrate the superiority of our method, achieving F1 Scores of 63.5% and 53.3%, respectively.

Graphical Abstract
Dual-Pathway Sensing with Optimized Attention Network for Video Summarization in Surveillance Systems

Keywords
video summarization
visual intelligence
surveillance systems
dual-pathway
attention network

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Bleu, N. (2022). 25 Latest Facebook Video Statistics, Facts, And Trends (2022). Retrieved from https://bloggingwizard.com/facebook-video-statistics/ (accessed on 29 December 2025).
    [Google Scholar]
  2. Ajmal, M., Naseer, M., Ahmad, F., & Saleem, A. (2017, December). Human motion trajectory analysis based video summarization. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 550-555). IEEE.
    [CrossRef]   [Google Scholar]
  3. Wang, Z., Liu, Z., Li, G., Wang, Y., Zhang, T., Xu, L., & Wang, J. (2021). Spatio-temporal self-attention network for video saliency prediction. IEEE Transactions on Multimedia, 25, 1161-1174.
    [CrossRef]   [Google Scholar]
  4. Li, H., Ke, Q., Gong, M., & Drummond, T. (2023, January). Progressive Video Summarization via Multimodal Self-supervised Learning. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 5573-5582). IEEE.
    [CrossRef]   [Google Scholar]
  5. Liang, G., Lv, Y., Li, S., Wang, X., & Zhang, Y. (2022). Video summarization with a dual-path attentive network. Neurocomputing, 467, 1-9.
    [CrossRef]   [Google Scholar]
  6. Zhang, Y., Zhang, T., Wang, S., & Yu, P. (2025). An efficient perceptual video compression scheme based on deep learning-assisted video saliency and just noticeable distortion. Engineering Applications of Artificial Intelligence, 141, 109806.
    [CrossRef]   [Google Scholar]
  7. Elhamifar, E., Sapiro, G., & Sastry, S. S. (2015). Dissimilarity-based sparse subset selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11), 2182–2197.
    [CrossRef]   [Google Scholar]
  8. Zhou, K., Qiao, Y., & Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
    [CrossRef]   [Google Scholar]
  9. Yuan, L., Tay, F. E. H., Li, P., & Feng, J. (2019). Unsupervised video summarization with cycle-consistent adversarial LSTM networks. IEEE Transactions on Multimedia, 22(10), 2711–2722.
    [CrossRef]   [Google Scholar]
  10. Muhammad, K., Hussain, T., & Baik, S. W. (2020). Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognition Letters, 130, 370-375.
    [CrossRef]   [Google Scholar]
  11. Zhang, K., Chao, W. L., Sha, F., & Grauman, K. (2016, September). Video summarization with long short-term memory. In European conference on computer vision (pp. 766-782). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  12. Zhao, B., Li, X., & Lu, X. (2017). Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM International Conference on Multimedia (pp. 863–871).
    [CrossRef]   [Google Scholar]
  13. Rochan, M., Ye, L., & Wang, Y. (2018). Video summarization using fully convolutional sequence networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 347–363).
    [CrossRef]   [Google Scholar]
  14. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [Google Scholar]
  15. Karthik, R., Hariharan, M., Anand, S., Mathikshara, P., Johnson, A., & Menaka, R. (2020). Attention embedded residual CNN for disease detection in tomato leaves. Applied Soft Computing, 86, 105933.
    [CrossRef]   [Google Scholar]
  16. Ji, Z., Xiong, K., Pang, Y., & Li, X. (2019). Video summarization with attention-based encoder–decoder networks. IEEE Transactions on Circuits and Systems for Video Technology, 30(6), 1709–1717.
    [CrossRef]   [Google Scholar]
  17. Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., & Shao, L. (2021). Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition, 111, 107677.
    [CrossRef]   [Google Scholar]
  18. Ji, Z., Zhao, Y., Pang, Y., Li, X., & Han, J. (2020). Deep attentive video summarization with distribution consistency learning. IEEE Transactions on Neural Networks and Learning Systems, 32(4), 1765–1775.
    [CrossRef]   [Google Scholar]
  19. Zhu, W., Lu, J., Han, Y., & Zhou, J. (2022). Learning multiscale hierarchical attention for video summarization. Pattern Recognition, 122, 108312.
    [CrossRef]   [Google Scholar]
  20. An, Y., & Zhao, S. (2022). SHTVS: Shot-level based Hierarchical Transformer for Video Summarization. In Proceedings of the 2022 5th International Conference on Image and Graphics Processing (pp. 268–274).
    [CrossRef]   [Google Scholar]
  21. Ngo, C. W., Ma, Y. F., & Zhang, H. J. (2005). Video summarization and scene detection by graph modeling. IEEE Transactions on circuits and systems for video technology, 15(2), 296-305.
    [CrossRef]   [Google Scholar]
  22. Zhou, H., Sadka, A. H., Swash, M. R., Azizi, J., & Sadiq, U. A. (2010). Feature extraction and clustering for dynamic video summarisation. Neurocomputing, 73(10–12), 1718–1729.
    [CrossRef]   [Google Scholar]
  23. Lee, Y. J., Ghosh, J., & Grauman, K. (2012, June). Discovering important people and objects for egocentric video summarization. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1346-1353). IEEE.
    [CrossRef]   [Google Scholar]
  24. Mundur, P., Rao, Y., & Yesha, Y. (2006). Keyframe-based video summarization using delaunay clustering. International journal on digital libraries, 6(2), 219-232.
    [CrossRef]   [Google Scholar]
  25. De Avila, S. E. F., Lopes, A. P. B., da Luz Jr, A., & de Albuquerque Araújo, A. (2011). VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 32(1), 56–68.
    [CrossRef]   [Google Scholar]
  26. Chu, W. S., Song, Y., & Jaimes, A. (2015, June). Video co-summarization: Video summarization by visual co-occurrence. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3584-3592). IEEE.
    [CrossRef]   [Google Scholar]
  27. Mei, S., Guan, G., Wang, Z., Wan, S., He, M., & Feng, D. D. (2015). Video summarization via minimum sparse reconstruction. Pattern Recognition, 48(2), 522–533.
    [CrossRef]   [Google Scholar]
  28. Li, X., Zhao, B., & Lu, X. (2017). A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing, 26(8), 3652–3664.
    [CrossRef]   [Google Scholar]
  29. Mei, S., Ma, M., Wan, S., Hou, J., Wang, Z., & Feng, D. D. (2020). Patch based video summarization with block sparse representation. IEEE Transactions on Multimedia, 23, 732–747.
    [CrossRef]   [Google Scholar]
  30. Muhammad, K., Hussain, T., Tanveer, M., Sannino, G., & De Albuquerque, V. H. C. (2019). Cost-effective video summarization using deep CNN with hierarchical weighted fusion for IoT surveillance networks. IEEE Internet of Things Journal, 7(5), 4455–4463.
    [CrossRef]   [Google Scholar]
  31. Fei, M., Jiang, W., & Mao, W. (2017). Memorable and rich video summarization. Journal of Visual Communication and Image Representation, 42, 207–217.
    [CrossRef]   [Google Scholar]
  32. Muhammad, K., Hussain, T., Del Ser, J., Palade, V., & De Albuquerque, V. H. C. (2019). DeepReS: A deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Transactions on Industrial Informatics, 16(9), 5938–5947.
    [CrossRef]   [Google Scholar]
  33. Mohan, J., & Nair, M. S. (2019). Static video summarization using sparse autoencoders. In 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1–8). IEEE.
    [CrossRef]   [Google Scholar]
  34. Zhong, R., Wang, R., Zou, Y., Hong, Z., & Hu, M. (2021). Graph attention networks adjusted bi-LSTM for video summarization. IEEE Signal Processing Letters, 28, 663–667.
    [CrossRef]   [Google Scholar]
  35. Sahu, A., & Chowdhury, A. S. (2021). First person video summarization using different graph representations. Pattern Recognition Letters, 146, 185–192.
    [CrossRef]   [Google Scholar]
  36. Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014, September). Category-specific video summarization. In European conference on computer vision (pp. 540-555). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  37. Gygli, M., Grabner, H., Riemenschneider, H., & Van Gool, L. (2014, September). Creating summaries from user videos. In European conference on computer vision (pp. 505-520). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  38. Zhang, K., Grauman, K., & Sha, F. (2018, September). Retrospective Encoders for Video Summarization. In European Conference on Computer Vision (pp. 391-408).
    [CrossRef]   [Google Scholar]
  39. Zhao, B., Li, X., & Lu, X. (2020). TTH-RNN: Tensor-train hierarchical recurrent neural network for video summarization. IEEE Transactions on Industrial Electronics, 68(4), 3629–3637.
    [CrossRef]   [Google Scholar]
  40. Fajtl, J., Sokeh, H. S., Argyriou, V., Monekosso, D., & Remagnino, P. (2018, December). Summarizing videos with attention. In Asian conference on computer vision (pp. 39-54). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  41. Zhu, W., Lu, J., Li, J., & Zhou, J. (2020). Dsnet: A flexible detect-to-summarize network for video summarization. IEEE Transactions on Image Processing, 30, 948-962.
    [CrossRef]   [Google Scholar]
  42. Munsif, M., Khan, N., Hussain, A., Kim, M. J., & Baik, S. W. (2024). Darkness-adaptive action recognition: Leveraging efficient tubelet slow-fast network for industrial applications. IEEE Transactions on Industrial Informatics.
    [CrossRef]   [Google Scholar]
  43. Amin, S. U., Abbas, M. S., Kim, B., Jung, Y., & Seo, S. (2024). Enhanced anomaly detection in pandemic surveillance videos: An attention approach with EfficientNet-B0 and CBAM integration. IEEE Access.
    [CrossRef]   [Google Scholar]
  44. Samel, K., Beedu, A., Sontakke, N., & Essa, I. (2024). Exploring Efficient Foundational Multi-modal Models for Video Summarization. arXiv preprint arXiv:2410.07405.
    [Google Scholar]
  45. Lebron Casas, L., & Koblents, E. (2018). Video summarization with LSTM and deep attention models. In International Conference on Multimedia Modeling (pp. 67–79). Springer.
    [CrossRef]   [Google Scholar]
  46. Zhang, Y., Wang, S., Zhang, Y., & Yu, P. (2025). Asymmetric light-aware progressive decoding network for RGB-thermal salient object detection. Journal of Electronic Imaging, 34(1), 013005–013005.
    [CrossRef]   [Google Scholar]
  47. Chen, Z., Xu, Q., Cong, R., & Huang, Q. (2020, April). Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 10599-10606).
    [CrossRef]   [Google Scholar]
  48. Zhang, Q., Cong, R., Li, C., Cheng, M. M., Fang, Y., Cao, X., ... & Kwong, S. (2020). Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Transactions on Image Processing, 30, 1305-1317.
    [CrossRef]   [Google Scholar]
  49. Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018, September). CBAM: Convolutional Block Attention Module. In European Conference on Computer Vision (pp. 3-19). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  50. Liang, B., Luo, H., Wang, J., & Shark, L. K. (2025). Multi-scale attention-edge interactive refinement network for salient object detection. Expert Systems with Applications, 275, 127056.
    [CrossRef]   [Google Scholar]
  51. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [Google Scholar]
  52. Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015, June). TVSum: Summarizing web videos using titles. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5179-5187). IEEE.
    [CrossRef]   [Google Scholar]
  53. Zhang, K., Chao, W. L., Sha, F., & Grauman, K. (2016, June). Summary Transfer: Exemplar-Based Subset Selection for Video Summarization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1059-1067). IEEE.
    [CrossRef]   [Google Scholar]
  54. Li, Y., Wang, L., Yang, T., & Gong, B. (2018, September). How Local Is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization. In European Conference on Computer Vision (pp. 156-174).
    [CrossRef]   [Google Scholar]
  55. Huang, C., & Wang, H. (2019). A novel key-frames selection framework for comprehensive video summarization. IEEE Transactions on Circuits and Systems for Video Technology, 30(2), 577–589.
    [CrossRef]   [Google Scholar]
  56. Zhao, B., Li, X., & Lu, X. (2018, June). HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7405-7414). IEEE.
    [CrossRef]   [Google Scholar]
  57. Elfeki, M., & Borji, A. (2019, January). Video summarization via actionness ranking. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 754-763). IEEE.
    [CrossRef]   [Google Scholar]
  58. Fu, H., & Wang, H. (2021). Self-attention binary neural tree for video summarization. Pattern recognition letters, 143, 19-26.
    [CrossRef]   [Google Scholar]
  59. Lin, J., Zhong, S. H., & Fares, A. (2022). Deep hierarchical LSTM networks with attention for video summarization. Computers & Electrical Engineering, 97, 107618.
    [CrossRef]   [Google Scholar]
  60. Alharbi, F., Habib, S., Albattah, W., Jan, Z., Alanazi, M. D., & Islam, M. (2024). Effective video summarization using channel attention-assisted encoder–decoder framework. Symmetry, 16(6), 680.
    [CrossRef]   [Google Scholar]
  61. Zhang, K., Wang, W., Lv, Z., Fan, Y., & Song, Y. (2021). Computer vision detection of foreign objects in coal processing using attention CNN. Engineering Applications of Artificial Intelligence, 102, 104242.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Khan, T. A., Ali, D., Ghazanfar, Z., & Ahmad, B. (2025). Dual-Pathway Sensing with Optimized Attention Network for Video Summarization in Surveillance Systems. ICCK Transactions on Sensing, Communication, and Control, 2(4), 276–289. https://doi.org/10.62762/TSCC.2025.308540
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Khan, Taimur Ali
AU  - Ali, Danish
AU  - Ghazanfar, Zainab
AU  - Ahmad, Bilal
PY  - 2025
DA  - 2025/12/30
TI  - Dual-Pathway Sensing with Optimized Attention Network for Video Summarization in Surveillance Systems
JO  - ICCK Transactions on Sensing, Communication, and Control
T2  - ICCK Transactions on Sensing, Communication, and Control
JF  - ICCK Transactions on Sensing, Communication, and Control
VL  - 2
IS  - 4
SP  - 276
EP  - 289
DO  - 10.62762/TSCC.2025.308540
UR  - https://www.icck.org/article/abs/TSCC.2025.308540
KW  - video summarization
KW  - visual intelligence
KW  - surveillance systems
KW  - dual-pathway
KW  - attention network
AB  - Video summarization (VS) aims to generate concise representations of long videos by extracting the most informative frames while maintaining essential content. Existing methods struggle to capture multi-scale dependencies and often rely on suboptimal feature representations, limiting their ability to model complex inter-frame relationships. To address these issues, we propose a multi-scale sensing network that incorporates three key innovations to improve VS. First, we introduce multi-scale dilated convolution blocks with progressively increasing dilation rates to capture temporal context at multiple levels, enabling the network to understand both local transitions and long-range dependencies. Second, we develop a Dual-Pathway Efficient Channel Attention (DECA) module that leverages statistics from Global Average Pooling and Global Max Pooling pathways. Third, we suggest an Optimized Spatial Attention (OSA) module that replaces standard $7\times7$ convolutions with more efficient operations while maintaining spatial dependency modeling. The proposed framework uses EfficientNetB7 as the backbone for robust spatial feature extraction, followed by multi-scale dilated blocks and dual attention mechanisms for detailed feature refinement. Extensive tests on the TVSum and SumMe benchmark datasets demonstrate the superiority of our method, achieving F1 Scores of 63.5% and 53.3%, respectively.
SN  - 3068-9287
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Khan2025DualPathwa,
  author = {Taimur Ali Khan and Danish Ali and Zainab Ghazanfar and Bilal Ahmad},
  title = {Dual-Pathway Sensing with Optimized Attention Network for Video Summarization in Surveillance Systems},
  journal = {ICCK Transactions on Sensing, Communication, and Control},
  year = {2025},
  volume = {2},
  number = {4},
  pages = {276-289},
  doi = {10.62762/TSCC.2025.308540},
  url = {https://www.icck.org/article/abs/TSCC.2025.308540},
  abstract = {Video summarization (VS) aims to generate concise representations of long videos by extracting the most informative frames while maintaining essential content. Existing methods struggle to capture multi-scale dependencies and often rely on suboptimal feature representations, limiting their ability to model complex inter-frame relationships. To address these issues, we propose a multi-scale sensing network that incorporates three key innovations to improve VS. First, we introduce multi-scale dilated convolution blocks with progressively increasing dilation rates to capture temporal context at multiple levels, enabling the network to understand both local transitions and long-range dependencies. Second, we develop a Dual-Pathway Efficient Channel Attention (DECA) module that leverages statistics from Global Average Pooling and Global Max Pooling pathways. Third, we suggest an Optimized Spatial Attention (OSA) module that replaces standard \$7\times7\$ convolutions with more efficient operations while maintaining spatial dependency modeling. The proposed framework uses EfficientNetB7 as the backbone for robust spatial feature extraction, followed by multi-scale dilated blocks and dual attention mechanisms for detailed feature refinement. Extensive tests on the TVSum and SumMe benchmark datasets demonstrate the superiority of our method, achieving F1 Scores of 63.5\% and 53.3\%, respectively.},
  keywords = {video summarization, visual intelligence, surveillance systems, dual-pathway, attention network},
  issn = {3068-9287},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 189
PDF Downloads: 30

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Sensing, Communication, and Control

ICCK Transactions on Sensing, Communication, and Control

ISSN: 3068-9287 (Online) | ISSN: 3068-9279 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/