Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation

Shaohuang Wang

doi:10.62762/TETAI.2024.320179

Article Information

Published in ICCK Transactions on Emerging Topics in Artificial Intelligence

Volume/Issue Volume 1, Issue 1, 2024

Pages 17-30

Cited by 5 (Crossref) 10 (Scopus)

Abstract

In this paper, a novel fast object detection framework is introduced, designed to meet the needs of real-time applications such as autonomous driving and robot navigation. Traditional processing methods often trade off between accuracy and processing speed. To address this issue, a hybrid data representation method is proposed that combines the computational efficiency of voxelization with the detail capture capability of direct data processing to optimize overall performance. The detection framework comprises two main components: a Rapid Region Proposal Network (RPN) and a Refinement Detection Network (RefinerNet). The RPN is used to generate high-quality candidate regions, while the RefinerNet performs detailed analysis on these regions to improve detection accuracy. Additionally, a variety of network optimization strategies are incorporated, including lightweight depthwise separable convolutions and GPU-accelerated parallel inference, to increase processing speed and reduce computational resource consumption. Extensive testing on the KITTI and NEXET datasets has proven the effectiveness of this method in enhancing the accuracy of object detection and real-time processing speed. The experimental results show that, compared to existing technologies, this method performs exceptionally well across multiple evaluation metrics, especially in meeting the stringent requirements of real-time applications in terms of processing speed.

Graphical Abstract

Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation

Keywords

object detection real-time refinement network optimization

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The author declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Arnold, E., Al-Jarrah, O. Y., Dianati, M., Fallah, S., Oxtoby, D., & Mouzakitis, A. (2019). A survey on 3d object detection methods for autonomous driving applications. IEEE Transactions on Intelligent Transportation Systems, 20(10), 3782-3795.
[CrossRef] [Google Scholar]
Shi, S., Wang, X., & Li, H. (2019). Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 770-779).
[Google Scholar]
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12697-12705).
[Google Scholar]
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10529-10538).
[Google Scholar]
Yan, Y., Mao, Y., & Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18(10), 3337.
[CrossRef] [Google Scholar]
Maturana, D., & Scherer, S. (2015, September). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922-928). Ieee.
[CrossRef] [Google Scholar]
Wen, L. H., & Jo, K. H. (2021). Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone. IEEE access, 9, 22080-22089.
[CrossRef] [Google Scholar]
Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1907-1915).
[Google Scholar]
Zhou, Y., & Tuzel, O. (2018). Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4490-4499).
[Google Scholar]
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652-660).
[Google Scholar]
Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30.
[Google Scholar]
Redmon, J. (2016). Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[Google Scholar]
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569-6578).
[Google Scholar]
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9157-9166).
[Google Scholar]
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4203-4212).
[Google Scholar]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
[CrossRef] [Google Scholar]
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).
[Google Scholar]
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[Google Scholar]
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).
[Google Scholar]
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[CrossRef] [Google Scholar]
Wu, B., Iandola, F., Jin, P. H., & Keutzer, K. (2017). Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 129-137).
[Google Scholar]
Wang, R. J., Li, X., & Ling, C. X. (2018). Pelee: A real-time object detection system on mobile devices. Advances in neural information processing systems, 31.
[Google Scholar]
Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., & Sun, J. (2019). ThunderNet: Towards real-time generic object detection on mobile devices. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6718-6727).
[Google Scholar]
Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[Google Scholar]
Unal, D., Catak, F. O., Houkan, M. T., Mudassir, M., & Hammoudeh, M. (2023). Towards robust autonomous driving systems through adversarial test set generation. ISA transactions, 132, 69-79.
[CrossRef] [Google Scholar]
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[CrossRef] [Google Scholar]
Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.
[Google Scholar]
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
[Google Scholar]
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627-9636).
[Google Scholar]
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
[CrossRef] [Google Scholar]
Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464-7475).
[Google Scholar]
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
[CrossRef] [Google Scholar]

Cited By (5)

尧文强 Yao Wenqiang, 曹林 Cao Lin, 田澍 Tian Shu, 杜康宁 Du Kangning, 郭亚男 Guo Yanan, 宋沛然 Song Peiran, 刘志哲 Liu Zhizhe. 基于频域动态感知特征增强的合成孔径雷达目标检测方法. Laser & Optoelectronics Progress, 2026 , 63 (6).
[CrossRef]
Guoliang Yang, Dali Weng, Zhiteng Li, Yonggan Wu. Tomato Ripeness Detection Model Based on Improved RT-DETR Lightweight Model. Agronomy, 2026 , 16 (9).
[CrossRef]
Yubo Zhang, Zhenning Su, Yong Wang, Boren Tan, Chen Zhao. Droplet Detection and Tracking in Complex Motions Based on YOLOv5s Network. IET Image Processing, 2025 , 19 (1).
[CrossRef]
Daiqing Tan, Hao Zang, Xinyue Zhang, Han Gao, Ji Wang, Zaijian Wang, Xing Zhai, Huixia Li, Yan Tang, Aiqing Han. Tongue-LiteSAM: A Lightweight Model for Tongue Image Segmentation With Zero-Shot. IEEE Access, 2025 , 13 .
[CrossRef]
Vincent F. Yu, Gemilang Santiyuda, Shih-Wei Lin, Udjianna S. Pasaribu, Yuli Sri Afrianti. Neural Network Pruning for Lightweight Metal Corrosion Image Segmentation Models. IEEE Access, 2025 , 13 .
[CrossRef]

* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style

Wang, S.(2024). Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation. ICCK Transactions on Emerging Topics in Artificial Intelligence, 1(1), 17-30. https://doi.org/10.62762/TETAI.2024.320179

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Wang, Shaohuang
PY  - 2024
DA  - 2024/04/20
TI  - Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation
JO  - ICCK Transactions on Emerging Topics in Artificial Intelligence
T2  - ICCK Transactions on Emerging Topics in Artificial Intelligence
JF  - ICCK Transactions on Emerging Topics in Artificial Intelligence
VL  - 1
IS  - 1
SP  - 17
EP  - 30
DO  - 10.62762/TETAI.2024.320179
UR  - https://www.icck.org/article/abs/TETAI.2024.320179
KW  - object detection
KW  - real-time
KW  - refinement
KW  - network optimization
AB  - In this paper, a novel fast object detection framework is introduced, designed to meet the needs of real-time applications such as autonomous driving and robot navigation. Traditional processing methods often trade off between accuracy and processing speed. To address this issue, a hybrid data representation method is proposed that combines the computational efficiency of voxelization with the detail capture capability of direct data processing to optimize overall performance. The detection framework comprises two main components: a Rapid Region Proposal Network (RPN) and a Refinement Detection Network (RefinerNet). The RPN is used to generate high-quality candidate regions, while the RefinerNet performs detailed analysis on these regions to improve detection accuracy. Additionally, a variety of network optimization strategies are incorporated, including lightweight depthwise separable convolutions and GPU-accelerated parallel inference, to increase processing speed and reduce computational resource consumption. Extensive testing on the KITTI and NEXET datasets has proven the effectiveness of this method in enhancing the accuracy of object detection and real-time processing speed. The experimental results show that, compared to existing technologies, this method performs exceptionally well across multiple evaluation metrics, especially in meeting the stringent requirements of real-time applications in terms of processing speed.
SN  - 3068-6652
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Wang2024RealTime,
  author = {Shaohuang Wang},
  title = {Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation},
  journal = {ICCK Transactions on Emerging Topics in Artificial Intelligence},
  year = {2024},
  volume = {1},
  number = {1},
  pages = {17-30},
  doi = {10.62762/TETAI.2024.320179},
  url = {https://www.icck.org/article/abs/TETAI.2024.320179},
  abstract = {In this paper, a novel fast object detection framework is introduced, designed to meet the needs of real-time applications such as autonomous driving and robot navigation. Traditional processing methods often trade off between accuracy and processing speed. To address this issue, a hybrid data representation method is proposed that combines the computational efficiency of voxelization with the detail capture capability of direct data processing to optimize overall performance. The detection framework comprises two main components: a Rapid Region Proposal Network (RPN) and a Refinement Detection Network (RefinerNet). The RPN is used to generate high-quality candidate regions, while the RefinerNet performs detailed analysis on these regions to improve detection accuracy. Additionally, a variety of network optimization strategies are incorporated, including lightweight depthwise separable convolutions and GPU-accelerated parallel inference, to increase processing speed and reduce computational resource consumption. Extensive testing on the KITTI and NEXET datasets has proven the effectiveness of this method in enhancing the accuracy of object detection and real-time processing speed. The experimental results show that, compared to existing technologies, this method performs exceptionally well across multiple evaluation metrics, especially in meeting the stringent requirements of real-time applications in terms of processing speed.},
  keywords = {object detection, real-time, refinement, network optimization},
  issn = {3068-6652},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

5

Scopus

10

Views

5777

PDF Downloads

792

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Copyright © 2024 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

ICCK Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3068-6652 (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities