Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks

Jiaxun Yang; Yilihamujiang Gapar

doi:10.62762/TETAI.2024.115892

Article Information

Published in ICCK Transactions on Emerging Topics in Artificial Intelligence

Volume/Issue Volume 1, Issue 1, 2024

Pages 31-43

Cited by 6 (Crossref) 12 (Scopus)

Abstract

This paper proposes an improved object detection algorithm based on a dynamically deformable convolutional network (D-DCN), aiming to solve the multi-scale and variability challenges in object detection tasks. First, we review traditional methods in the field of object detection and introduce the current research status of improved methods based on multi-scale and variability convolutional neural networks. Then, we introduce in detail our proposed improved algorithms, including an improved feature pyramid network and a dynamically deformable network. In the improved feature pyramid network, we introduce a multi-scale feature fusion mechanism to better capture target information at different scales. In dynamically deformable networks, we propose dynamic offset calculations and dynamic convolution operations to achieve dynamic adaptation to the target shape and pose. We also validate our method by conducting experiments on the datasets KITTI, NEXET, and Caltech. Finally, we design a comprehensive loss function that considers both location localization error and category classification error to guide model training. Experimental results show that our improved algorithm achieves significant performance improvements in target detection tasks, with higher accuracy and robustness compared with traditional methods. Our work provides an effective method to solve the multi-scale and variability challenges in target detection tasks and has high practical value and prospects for general application.

Graphical Abstract

Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks

Keywords

object detection feature pyramid network multi-scale fusion dynamic convolution KITTI Caltech

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Wang, J., Bebis, G., & Miller, R. (2006, June). Robust video-based surveillance by integrating target detection with tracking. In 2006 conference on computer vision and pattern recognition workshop (CVPRW'06) (pp. 137-137). IEEE.
[CrossRef] [Google Scholar]
Li, J., Ye, D. H., Chung, T., Kolsch, M., Wachs, J., & Bouman, C. (2016, October). Multi-target detection and tracking from a single camera in Unmanned Aerial Vehicles (UAVs). In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4992-4997). IEEE.
[CrossRef] [Google Scholar]
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
[CrossRef] [Google Scholar]
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 (pp. 404-417). Springer Berlin Heidelberg.
[CrossRef] [Google Scholar]
Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.
[CrossRef] [Google Scholar]
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
[CrossRef] [Google Scholar]
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88, 303-338.
[CrossRef] [Google Scholar]
Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[CrossRef] [Google Scholar]
Geiger, A., Wojek, C., & Urtasun, R. (2011). Joint 3d estimation of objects and scene layout. Advances in Neural Information Processing Systems, 24.
[Google Scholar]
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
[CrossRef] [Google Scholar]
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[CrossRef] [Google Scholar]
Glorot, X., Bordes, A., & Bengio, Y. (2011, June). Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 315-323). JMLR Workshop and Conference Proceedings. https://proceedings.mlr.press/v15/glorot11a
[Google Scholar]
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[CrossRef] [Google Scholar]
Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2011). Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence, 34(4), 743-761.
[CrossRef] [Google Scholar]
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals?. IEEE transactions on pattern analysis and machine intelligence, 38(4), 814-830.
[CrossRef] [Google Scholar]
Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr. https://proceedings.mlr.press/v37/ioffe15.html
[Google Scholar]
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678).
[CrossRef] [Google Scholar]
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[CrossRef] [Google Scholar]
Li, B., Wu, T., & Zhu, S. C. (2014). Integrating context and occlusion for car detection by hierarchical and-or model. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13 (pp. 652-667). Springer International Publishing.
[CrossRef] [Google Scholar]
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[CrossRef] [Google Scholar]
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[CrossRef] [Google Scholar]
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A. G., Ma, H., Fidler, S., & Urtasun, R. (2015). 3d object proposals for accurate object class detection. Advances in neural information processing systems, 28. https://proceedings.neurips.cc/paper_files/paper/2015/hash/6da37dd3139aa4d9aa55b8d237ec5d4a-Abstract.html
[Google Scholar]
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764-773).
[CrossRef] [Google Scholar]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
[CrossRef] [Google Scholar]
Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE transactions on pattern analysis and machine intelligence, 36(8), 1532-1545.
[CrossRef] [Google Scholar]
Ohn-Bar, E., & Trivedi, M. M. (2015). Learning to detect vehicles by clustering appearance patterns. IEEE Transactions on Intelligent Transportation Systems, 16(5), 2511-2521.
[CrossRef] [Google Scholar]
Paisitkriangkrai, S., Shen, C., & van den Hengel, A. (2015). Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE transactions on pattern analysis and machine intelligence, 38(6), 1243-1257.
[CrossRef] [Google Scholar]
Pepik, B., Stark, M., Gehler, P., & Schiele, B. (2015). Multi-view and 3d deformable part models. IEEE transactions on pattern analysis and machine intelligence, 37(11), 2232-2245.
[CrossRef] [Google Scholar]
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[CrossRef] [Google Scholar]
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
[Google Scholar]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211-252.
[CrossRef] [Google Scholar]
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).
[Google Scholar]
Tian, Y., Luo, P., Wang, X., & Tang, X. (2015). Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE international conference on computer vision (pp. 1904-1912).
[CrossRef] [Google Scholar]
Wang, X., Yang, M., Zhu, S., & Lin, Y. (2013). Regionlets for generic object detection. In Proceedings of the IEEE international conference on computer vision (pp. 17-24).
[CrossRef] [Google Scholar]
Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2015). Data-driven 3d voxel patterns for object category recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1903-1911).
[CrossRef] [Google Scholar]
Yang, F., Choi, W., & Lin, Y. (2016). Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2129-2137).
[CrossRef] [Google Scholar]
Cai, Z., Saberian, M., & Vasconcelos, N. (2015). Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision (pp. 3361-3369).
[CrossRef] [Google Scholar]
Zhang, S., Benenson, R., & Schiele, B. (2015, June). Filtered channel features for pedestrian detection. In CVPR (Vol. 1, No. 2, p. 4).
[Google Scholar]
Zhu, Y., Urtasun, R., Salakhutdinov, R., & Fidler, S. (2015). segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4703-4711).
[CrossRef] [Google Scholar]

Cited By (6)

Xinghan Zhao, Qingpeng Song, Xuejun Wu. Hemo-MPO: A mesh-based physics-informed operator network for hemodynamics prediction. Alexandria Engineering Journal, 2026 , 148 .
[CrossRef]
Wangzhong Li, Biao Li, Xuefeng Rao, Xiangmeng Pan, Yujie Zhao. Dual-Scale Parallel Training: Boosting Scale Generalization for Object Detectors Without Inference Overhead. IEEE Access, 2026 , 14 .
[CrossRef]
Taoran Gan, Mingze Li, Xueling Wang, Wenjun Zhang, Yihan Ma. Landscape environment space extraction based on YOLOv5 and distance information metrics. International Journal of Architectural Computing, 2025 .
[CrossRef]
Marjan Kia, Soroush Sadeghi, Homayoun Safarpour, Mohammadreza Kamsari, Saeid Jafarzadeh Ghoushchi, Ramin Ranjbarzadeh. Innovative fusion of VGG16, MobileNet, EfficientNet, AlexNet, and ResNet50 for MRI-based brain tumor identification. Iran Journal of Computer Science, 2025 , 8 (1).
[CrossRef]
Yanfang Liu, Rui Zhou, Zheng Yao, Jiayu She, Naiming Qi. STDSD: A Spacecraft Target Detection Framework Considering Similarity and Diversity. IEEE Transactions on Automation Science and Engineering, 2025 , 22 .
[CrossRef]
Wei Li, Xu Xu, Wei Wang, Junxin Chen. Lightweight Plant Disease Detection With Adaptive Multi‐Scale Model and Relationship‐Based Knowledge Distillation. Expert Systems, 2025 , 42 (6).
[CrossRef]

* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style

Yang, J., & Gapar, Y. (2024). Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks. ICCK Transactions on Emerging Topics in Artificial Intelligence, 1(1), 31-43. https://doi.org/10.62762/TETAI.2024.115892

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Yang, Jiaxun
AU  - Gapar, Yilihamujiang
PY  - 2024
DA  - 2024/05/21
TI  - Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks
JO  - ICCK Transactions on Emerging Topics in Artificial Intelligence
T2  - ICCK Transactions on Emerging Topics in Artificial Intelligence
JF  - ICCK Transactions on Emerging Topics in Artificial Intelligence
VL  - 1
IS  - 1
SP  - 31
EP  - 43
DO  - 10.62762/TETAI.2024.115892
UR  - https://www.icck.org/article/abs/TETAI.2024.115892
KW  - object detection
KW  - feature pyramid network
KW  - multi-scale fusion
KW  - dynamic convolution
KW  - KITTI
KW  - Caltech
AB  - This paper proposes an improved object detection algorithm based on a dynamically deformable convolutional network (D-DCN), aiming to solve the multi-scale and variability challenges in object detection tasks. First, we review traditional methods in the field of object detection and introduce the current research status of improved methods based on multi-scale and variability convolutional neural networks. Then, we introduce in detail our proposed improved algorithms, including an improved feature pyramid network and a dynamically deformable network. In the improved feature pyramid network, we introduce a multi-scale feature fusion mechanism to better capture target information at different scales. In dynamically deformable networks, we propose dynamic offset calculations and dynamic convolution operations to achieve dynamic adaptation to the target shape and pose. We also validate our method by conducting experiments on the datasets KITTI, NEXET, and Caltech. Finally, we design a comprehensive loss function that considers both location localization error and category classification error to guide model training. Experimental results show that our improved algorithm achieves significant performance improvements in target detection tasks, with higher accuracy and robustness compared with traditional methods. Our work provides an effective method to solve the multi-scale and variability challenges in target detection tasks and has high practical value and prospects for general application.
SN  - 3068-6652
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Yang2024Improved,
  author = {Jiaxun Yang and Yilihamujiang Gapar},
  title = {Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks},
  journal = {ICCK Transactions on Emerging Topics in Artificial Intelligence},
  year = {2024},
  volume = {1},
  number = {1},
  pages = {31-43},
  doi = {10.62762/TETAI.2024.115892},
  url = {https://www.icck.org/article/abs/TETAI.2024.115892},
  abstract = {This paper proposes an improved object detection algorithm based on a dynamically deformable convolutional network (D-DCN), aiming to solve the multi-scale and variability challenges in object detection tasks. First, we review traditional methods in the field of object detection and introduce the current research status of improved methods based on multi-scale and variability convolutional neural networks. Then, we introduce in detail our proposed improved algorithms, including an improved feature pyramid network and a dynamically deformable network. In the improved feature pyramid network, we introduce a multi-scale feature fusion mechanism to better capture target information at different scales. In dynamically deformable networks, we propose dynamic offset calculations and dynamic convolution operations to achieve dynamic adaptation to the target shape and pose. We also validate our method by conducting experiments on the datasets KITTI, NEXET, and Caltech. Finally, we design a comprehensive loss function that considers both location localization error and category classification error to guide model training. Experimental results show that our improved algorithm achieves significant performance improvements in target detection tasks, with higher accuracy and robustness compared with traditional methods. Our work provides an effective method to solve the multi-scale and variability challenges in target detection tasks and has high practical value and prospects for general application.},
  keywords = {object detection, feature pyramid network, multi-scale fusion, dynamic convolution, KITTI, Caltech},
  issn = {3068-6652},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

6

Scopus

12

Views

5310

PDF Downloads

775

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2024 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3068-6652 (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities