Optimized CNNs for Rapid 3D Point Cloud Object Recognition

Tianyi Lyu; Dian Gu; Peiyuan Chen; Yaoting Jiang; Zhenhong Zhang; Huadong Pang; Li Zhou; Yiping Dong

doi:10.62762/TIOT.2024.758153

Article Information

Published in ICCK Transactions on Internet of Things

Volume/Issue Volume 2, Issue 4, 2024

Pages 83-94

Abstract

This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.

Graphical Abstract

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

Keywords

object detection L1 penalty point cloud MVTec 3D-AD

Funding

This work was supported without any funding.

References

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
[Google Scholar]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[Google Scholar]
Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018). Relation networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3588-3597).
[Google Scholar]
Pan, X., Xia, Z., Song, S., Li, L. E., & Huang, G. (2021). 3d object detection with pointformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7463-7472).
[Google Scholar]
Wang, D. Z., & Posner, I. (2015, July). Voting for voting in online point cloud object detection. In Robotics: science and systems (Vol. 1, No. 3, pp. 10-15).
[Google Scholar]
Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[Google Scholar]
Li, B., Zhang, T., & Xia, T. (2016). Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916.
[Google Scholar]
Chauhan, R., Ghanshala, K. K., & Joshi, R. C. (2018, December). Convolutional neural network (CNN) for image detection and recognition. In 2018 first international conference on secure cyber computing and communication (ICSCCC) (pp. 278-282). IEEE.
[Google Scholar]
Fathy, M., & Siyal, M. Y. (1995). An image detection technique based on morphological edge detection and background differencing for real-time traffic analysis. Pattern Recognition Letters, 16(12), 1321-1330.
[Google Scholar]
Liang, S., Li, Y., & Srikant, R. (2017). Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690.
[Google Scholar]
Suthaharan, S., & Suthaharan, S. (2016). Support vector machine. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 207-235.
[Google Scholar]
Maturana, D., & Scherer, S. (2015, September). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922-928). IEEE.
[Google Scholar]
Maturana, D., & Scherer, S. (2015, May). 3d convolutional neural networks for landing zone detection from lidar. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 3471-3478). IEEE.
[Google Scholar]
Graham, B. (2014). Spatially-sparse convolutional neural networks. arXiv preprint arXiv:1409.6070.
[Google Scholar]
Graham, B. (2015). Sparse 3D convolutional neural networks. arXiv preprint arXiv:1505.02890.
[Google Scholar]
Jampani, V., Kiefel, M., & Gehler, P. V. (2016). Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4452-4461).
[Google Scholar]
Chen, H., Dou, Q., Yu, L., & Heng, P. A. (2016). Voxresnet: Deep voxelwise residual networks for volumetric brain segmentation. arXiv preprint arXiv:1608.05895.
[Google Scholar]
Dou, Q., Chen, H., Yu, L., Zhao, L., Qin, J., Wang, D., ... & Heng, P. A. (2016). Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE transactions on medical imaging, 35(5), 1182-1195.
[Google Scholar]
Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., & Nielsen, M. (2013, September). Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In International conference on medical image computing and computer-assisted intervention (pp. 246-253). Berlin, Heidelberg: Springer Berlin Heidelberg.
[Google Scholar]
Derpanis, K. G. (2010). Overview of the RANSAC Algorithm. Image Rochester NY, 4(1), 2-3.
[Google Scholar]
Khan, K., Rehman, S. U., Aziz, K., Fong, S., & Sarasvady, S. (2014, February). DBSCAN: Past, present and future. In The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014) (pp. 232-238). IEEE.
[Google Scholar]
Zhou, Y., Ren, F., Nishide, S., & Kang, X. (2019, November). Facial sentiment classification based on resnet-18 model. In 2019 International Conference on electronic engineering and informatics (EEI) (pp. 463-466). IEEE.
[Google Scholar]
Bergmann, P., Jin, X., Sattlegger, D., & Steger, C. (2021). The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization. arXiv preprint arXiv:2112.09045.
[Google Scholar]
Rudolph, M., Wehrbein, T., Rosenhahn, B., & Wandt, B. (2023). Asymmetric student-teacher networks for industrial anomaly detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2592-2602).
[Google Scholar]
Bergmann, P., & Sattlegger, D. (2023). Anomaly detection in 3d point clouds using deep geometric descriptors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2613-2623).
[Google Scholar]
Cao, Y., Xu, X., & Shen, W. (2024). Complementary pseudo multimodal feature for point cloud anomaly detection. Pattern Recognition, 156, 110761.
[Google Scholar]
Wei, X., Yu, R., & Sun, J. (2020). View-GCN: View-based graph convolutional network for 3D shape analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1850-1859).
[Google Scholar]
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381-395.
[Google Scholar]
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).
[Google Scholar]
Zhou, Q. Y., Park, J., & Koltun, V. (2018). Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847.
[Google Scholar]
Rusu, R. B., Blodow, N., & Beetz, M. (2009, May). Fast point feature histograms (FPFH) for 3D registration. In 2009 IEEE international conference on robotics and automation (pp. 3212-3217). IEEE.
[Google Scholar]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[Google Scholar]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211-252.
[Google Scholar]
Zagoruyko, S. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146.
[Google Scholar]
Horwitz, E., & Hoshen, Y. (2023). Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2968-2977).
[Google Scholar]

Cite This Article

APA Style

Lyu, T., Gu, D., Chen, P., Jiang, Y., Zhang, Z., Pang, H., Zhou, L., & Dong, Y. (2024). Optimized CNNs for Rapid 3D Point Cloud Object Recognition. ICCK Transactions on Internet of Things, 2(4), 83–94. https://doi.org/10.62762/TIOT.2024.758153

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Lyu, Tianyi
AU  - Gu, Dian
AU  - Chen, Peiyuan
AU  - Jiang, Yaoting
AU  - Zhang, Zhenhong
AU  - Pang, Huadong
AU  - Zhou, Li
AU  - Dong, Yiping
PY  - 2024
DA  - 2024/12/08
TI  - Optimized CNNs for Rapid 3D Point Cloud Object Recognition
JO  - ICCK Transactions on Internet of Things
T2  - ICCK Transactions on Internet of Things
JF  - ICCK Transactions on Internet of Things
VL  - 2
IS  - 4
SP  - 83
EP  - 94
DO  - 10.62762/TIOT.2024.758153
UR  - https://www.icck.org/article/abs/TIOT.2024.758153
KW  - object detection
KW  - L1 penalty
KW  - point cloud
KW  - MVTec 3D-AD
AB  - This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.
SN  - pending
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Lyu2024Optimized,
  author = {Tianyi Lyu and Dian Gu and Peiyuan Chen and Yaoting Jiang and Zhenhong Zhang and Huadong Pang and Li Zhou and Yiping Dong},
  title = {Optimized CNNs for Rapid 3D Point Cloud Object Recognition},
  journal = {ICCK Transactions on Internet of Things},
  year = {2024},
  volume = {2},
  number = {4},
  pages = {83-94},
  doi = {10.62762/TIOT.2024.758153},
  url = {https://www.icck.org/article/abs/TIOT.2024.758153},
  abstract = {This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.},
  keywords = {object detection, L1 penalty, point cloud, MVTec 3D-AD},
  issn = {pending},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

0

Scopus

0

Views

2309

PDF Downloads

872

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Internet of Things

ISSN: pending (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities