Optimized CNNs for Rapid 3D Point Cloud Object Recognition
Article Information
Abstract
This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.
Graphical Abstract
Keywords
Funding
References
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
[Google Scholar] - Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[Google Scholar] - Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018). Relation networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3588-3597).
[Google Scholar] - Pan, X., Xia, Z., Song, S., Li, L. E., & Huang, G. (2021). 3d object detection with pointformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7463-7472).
[Google Scholar] - Wang, D. Z., & Posner, I. (2015, July). Voting for voting in online point cloud object detection. In Robotics: science and systems (Vol. 1, No. 3, pp. 10-15).
[Google Scholar] - Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[Google Scholar] - Li, B., Zhang, T., & Xia, T. (2016). Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916.
[Google Scholar] - Chauhan, R., Ghanshala, K. K., & Joshi, R. C. (2018, December). Convolutional neural network (CNN) for image detection and recognition. In 2018 first international conference on secure cyber computing and communication (ICSCCC) (pp. 278-282). IEEE.
[Google Scholar] - Fathy, M., & Siyal, M. Y. (1995). An image detection technique based on morphological edge detection and background differencing for real-time traffic analysis. Pattern Recognition Letters, 16(12), 1321-1330.
[Google Scholar] - Liang, S., Li, Y., & Srikant, R. (2017). Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690.
[Google Scholar] - Suthaharan, S., & Suthaharan, S. (2016). Support vector machine. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 207-235.
[Google Scholar] - Maturana, D., & Scherer, S. (2015, September). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922-928). IEEE.
[Google Scholar] - Maturana, D., & Scherer, S. (2015, May). 3d convolutional neural networks for landing zone detection from lidar. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 3471-3478). IEEE.
[Google Scholar] - Graham, B. (2014). Spatially-sparse convolutional neural networks. arXiv preprint arXiv:1409.6070.
[Google Scholar] - Graham, B. (2015). Sparse 3D convolutional neural networks. arXiv preprint arXiv:1505.02890.
[Google Scholar] - Jampani, V., Kiefel, M., & Gehler, P. V. (2016). Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4452-4461).
[Google Scholar] - Chen, H., Dou, Q., Yu, L., & Heng, P. A. (2016). Voxresnet: Deep voxelwise residual networks for volumetric brain segmentation. arXiv preprint arXiv:1608.05895.
[Google Scholar] - Dou, Q., Chen, H., Yu, L., Zhao, L., Qin, J., Wang, D., ... & Heng, P. A. (2016). Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE transactions on medical imaging, 35(5), 1182-1195.
[Google Scholar] - Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., & Nielsen, M. (2013, September). Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In International conference on medical image computing and computer-assisted intervention (pp. 246-253). Berlin, Heidelberg: Springer Berlin Heidelberg.
[Google Scholar] - Derpanis, K. G. (2010). Overview of the RANSAC Algorithm. Image Rochester NY, 4(1), 2-3.
[Google Scholar] - Khan, K., Rehman, S. U., Aziz, K., Fong, S., & Sarasvady, S. (2014, February). DBSCAN: Past, present and future. In The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014) (pp. 232-238). IEEE.
[Google Scholar] - Zhou, Y., Ren, F., Nishide, S., & Kang, X. (2019, November). Facial sentiment classification based on resnet-18 model. In 2019 International Conference on electronic engineering and informatics (EEI) (pp. 463-466). IEEE.
[Google Scholar] - Bergmann, P., Jin, X., Sattlegger, D., & Steger, C. (2021). The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization. arXiv preprint arXiv:2112.09045.
[Google Scholar] - Rudolph, M., Wehrbein, T., Rosenhahn, B., & Wandt, B. (2023). Asymmetric student-teacher networks for industrial anomaly detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2592-2602).
[Google Scholar] - Bergmann, P., & Sattlegger, D. (2023). Anomaly detection in 3d point clouds using deep geometric descriptors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2613-2623).
[Google Scholar] - Cao, Y., Xu, X., & Shen, W. (2024). Complementary pseudo multimodal feature for point cloud anomaly detection. Pattern Recognition, 156, 110761.
[Google Scholar] - Wei, X., Yu, R., & Sun, J. (2020). View-GCN: View-based graph convolutional network for 3D shape analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1850-1859).
[Google Scholar] - Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381-395.
[Google Scholar] - Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).
[Google Scholar] - Zhou, Q. Y., Park, J., & Koltun, V. (2018). Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847.
[Google Scholar] - Rusu, R. B., Blodow, N., & Beetz, M. (2009, May). Fast point feature histograms (FPFH) for 3D registration. In 2009 IEEE international conference on robotics and automation (pp. 3212-3217). IEEE.
[Google Scholar] - He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[Google Scholar] - Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211-252.
[Google Scholar] - Zagoruyko, S. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146.
[Google Scholar] - Horwitz, E., & Hoshen, Y. (2023). Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2968-2977).
[Google Scholar]
Cite This Article
TY - JOUR AU - Lyu, Tianyi AU - Gu, Dian AU - Chen, Peiyuan AU - Jiang, Yaoting AU - Zhang, Zhenhong AU - Pang, Huadong AU - Zhou, Li AU - Dong, Yiping PY - 2024 DA - 2024/12/08 TI - Optimized CNNs for Rapid 3D Point Cloud Object Recognition JO - ICCK Transactions on Internet of Things T2 - ICCK Transactions on Internet of Things JF - ICCK Transactions on Internet of Things VL - 2 IS - 4 SP - 83 EP - 94 DO - 10.62762/TIOT.2024.758153 UR - https://www.icck.org/article/abs/TIOT.2024.758153 KW - object detection KW - L1 penalty KW - point cloud KW - MVTec 3D-AD AB - This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications. SN - pending PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Lyu2024Optimized,
author = {Tianyi Lyu and Dian Gu and Peiyuan Chen and Yaoting Jiang and Zhenhong Zhang and Huadong Pang and Li Zhou and Yiping Dong},
title = {Optimized CNNs for Rapid 3D Point Cloud Object Recognition},
journal = {ICCK Transactions on Internet of Things},
year = {2024},
volume = {2},
number = {4},
pages = {83-94},
doi = {10.62762/TIOT.2024.758153},
url = {https://www.icck.org/article/abs/TIOT.2024.758153},
abstract = {This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an L1 penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with L1 regularization to effectively handle large-scale 3D data processing. Our method’s efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach’s capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.},
keywords = {object detection, L1 penalty, point cloud, MVTec 3D-AD},
issn = {pending},
publisher = {Institute of Central Computation and Knowledge}
}
Article Metrics
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Portico