DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction

Bo Liu; Runlong Li; Li Zhou; Yan Zhou

doi:10.62762/TIS.2025.874668

CiteScore

Impact Factor

Volume 2, Issue 3, ICCK Transactions on Intelligent Systematics

Volume 2, Issue 3, 2025

Submit Manuscript Edit a Special Issue

Academic Editor

Seifedine Kadry

Noroff University College, Norway

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Short and Long-Term Renewable Electricity Demand Forecasting Based on CNN-Bi-GRU Model

ICCK Transactions on Intelligent Systematics, Volume 2, Issue 3, 2025: 190-202

Free to Read | Research Article | 25 August 2025

DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction

Bo Liu 1

Runlong Li 2 *

Li Zhou 3

Yan Zhou 4

1 College of Computer Sciences, Northeastern University, Cupertino 95014, CA, United States

2 Department of Electrical Engineering and Computer Science, University of California, Irvine, Moreno Valley 92555, CA, United States

3 Desautels Faculty of Management, McGill University, Montréal 27708, Canada

4 Department of Mathematics, Northeastern University, San Jose 95131, CA, United States

* Corresponding Author: Runlong Li, [email protected]

DOI: 10.62762/TIS.2025.874668

Received: 06 June 2025, Accepted: 05 July 2025, Published: 25 August 2025

PDF (2.11 MB)

Article Metrics Cite This Article

Abstract

This paper proposes a Diffusion Model-Optimized Neural Radiance Field (DT-NeRF) method, aimed at enhancing detail recovery and multi-view consistency in 3D scene reconstruction. By combining diffusion models with Transformers, DT-NeRF effectively restores details under sparse viewpoints and maintains high accuracy in complex geometric scenes. Experimental results demonstrate that DT-NeRF significantly outperforms traditional NeRF and other state-of-the-art methods on the Matterport3D and ShapeNet datasets, particularly in metrics such as PSNR, SSIM, Chamfer Distance, and Fidelity. Ablation experiments further confirm the critical role of the diffusion and Transformer modules in the model's performance, with the removal of either module leading to a decline in performance. The design of DT-NeRF showcases the synergistic effect between modules, providing an efficient and accurate solution for 3D scene reconstruction. Future research may focus on further optimizing the model, exploring more advanced generative models and network architectures to enhance its performance in large-scale dynamic scenes.

Graphical Abstract

DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction

Keywords

diffusion model

NeRF

3D reconstruction

detail recovery

transformer network

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Su, S. Y., Yu, F., Zollhöfer, M., & Rhodin, H. (2021). A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. Advances in neural information processing systems, 34, 12278-12291.
[CrossRef] [Google Scholar]
Kosiorek, A. R., Strathmann, H., Zoran, D., Moreno, P., Schneider, R., Mokrá, S., & Rezende, D. J. (2021, July). Nerf-vae: A geometry aware 3d scene generative model. In International conference on machine learning (pp. 5742-5752). PMLR.
[Google Scholar]
Luo, H., Zhang, J., Liu, X., Zhang, L., & Liu, J. (2024). Large-scale 3d reconstruction from multi-view imagery: A comprehensive review. Remote Sensing, 16(5), 773.
[CrossRef] [Google Scholar]
Han, X. F., Laga, H., & Bennamoun, M. (2019). Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era. IEEE transactions on pattern analysis and machine intelligence, 43(5), 1578-1604.
[CrossRef] [Google Scholar]
Xiao, W., Chierchia, R., Cruz, R. S., Li, X., Ahmedt-Aristizabal, D., Salvado, O., ... & Lebrat, L. (2025). Neural Radiance Fields for the Real World: A Survey. arXiv preprint arXiv:2501.13104.
[Google Scholar]
Wang, Z., Wu, S., Xie, W., Chen, M., & Prisacariu, V. A. (2021). NeRF--: Neural radiance fields without known camera parameters.
[Google Scholar]
Wynn, J., & Turmukhambetov, D. (2023, June). DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4180-4189). IEEE.
[CrossRef] [Google Scholar]
Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2022, June). CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3825-3834). IEEE.
[CrossRef] [Google Scholar]
Deng, K., Liu, A., Zhu, J. Y., & Ramanan, D. (2022, June). Depth-supervised NeRF: Fewer Views and Faster Training for Free. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12872-12881). IEEE.
[CrossRef] [Google Scholar]
Fime, A. A., Mahmud, S., Das, A., Islam, M. S., & Kim, J. H. (2025). Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects. IEEE Access, 13, 95753-95796.
[CrossRef] [Google Scholar]
Furukawa, Y., & Hernández, C. (2015). Multi-view stereo: A tutorial. Foundations and trends® in Computer Graphics and Vision, 9(1-2), 1-148. http://dx.doi.org/10.1561/0600000052
[Google Scholar]
Belkaid, M., Merras, M., Berrajaa, A., & El Akkad, N. (2024, December). Review of 3D Scene Reconstruction: From Traditional Methods to Advanced Deep Learning Models. In 2024 3rd International Conference on Embedded Systems and Artificial Intelligence (ESAI) (pp. 1-11). IEEE.
[CrossRef] [Google Scholar]
Roldao, L., De Charette, R., & Verroust-Blondet, A. (2022). 3D semantic scene completion: A survey. International Journal of Computer Vision, 130(8), 1978-2005.
[CrossRef] [Google Scholar]
Jiang, S., You, K., Li, Y., Weng, D., & Chen, W. (2024). 3D reconstruction of spherical images: a review of techniques, applications, and prospects. Geo-spatial Information Science, 27(6), 1959-1988.
[CrossRef] [Google Scholar]
Ingale, A. K. (2021). Real-time 3D reconstruction techniques applied in dynamic scenes: A systematic literature review. Computer Science Review, 39, 100338.
[CrossRef] [Google Scholar]
Zhou, L., Wu, G., Zuo, Y., Chen, X., & Hu, H. (2024). A comprehensive review of vision-based 3d reconstruction methods. Sensors, 24(7), 2314.
[CrossRef] [Google Scholar]
Wang, Z., She, Q., & Ward, T. E. (2021). Generative adversarial networks in computer vision: A survey and taxonomy. ACM Computing Surveys (CSUR), 54(2), 1-38.
[CrossRef] [Google Scholar]
Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., & Gao, J. (2021). Focal attention for long-range interactions in vision transformers. Advances in Neural Information Processing Systems, 34, 30008-30022.
[CrossRef] [Google Scholar]
Yang, L., Zhu, Z., Nong, X. L. J., & Liang, Y. (2023, October). Long-Range Grouping Transformer for Multi-View 3D Reconstruction. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 18211-18221). IEEE.
[CrossRef] [Google Scholar]
Maxim, B., & Nedevschi, S. (2021, October). A survey on the current state of the art on deep learning 3D reconstruction. In 2021 IEEE 17th International Conference on Intelligent Computer Communication and Processing (ICCP) (pp. 283-290). IEEE.
[CrossRef] [Google Scholar]
Samavati, T., & Soryani, M. (2023). Deep learning-based 3D reconstruction: a survey. Artificial Intelligence Review, 56(9), 9175-9219.
[CrossRef] [Google Scholar]
Gao, K., Gao, Y., He, H., Lu, D., Xu, L., & Li, J. (2022). Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv preprint arXiv:2210.00379.
[Google Scholar]
Garbin, S. J., Kowalski, M., Johnson, M., Shotton, J., & Valentin, J. (2021). Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14346-14355).
[CrossRef] [Google Scholar]
Roessle, B., Müller, N., Porzi, L., Bulo, S. R., Kontschieder, P., & Nießner, M. (2023). Ganerf: Leveraging discriminators to optimize neural radiance fields. ACM Transactions on Graphics (TOG), 42(6), 1-14.
[CrossRef] [Google Scholar]
Xu, Q., Xu, Z., Philip, J., Bi, S., Shu, Z., Sunkavalli, K., & Neumann, U. (2022, June). Point-NeRF: Point-based Neural Radiance Fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5428-5438). IEEE.
[CrossRef] [Google Scholar]
Haider, M., Shahzad, K., Rameez, A. U., Umair, S., & Abbas, S. NeRF Explored: A Comprehensive Analysis of Neural Radiance Field in 3D Vision.
[Google Scholar]
Farshian, A., Götz, M., Cavallaro, G., Debus, C., Nießner, M., Benediktsson, J. A., & Streit, A. (2023). Deep-learning-based 3-d surface reconstruction—a survey. Proceedings of the IEEE, 111(11), 1464-1501.
[CrossRef] [Google Scholar]
Hou, J., Graham, B., Nießner, M., & Xie, S. (2021, June). Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 15582-15592). IEEE.
[CrossRef] [Google Scholar]
Kim, S., Baek, J., Park, J., Kim, G., & Kim, S. (2022, June). InstaFormer: Instance-Aware Image-to-Image Translation with Transformer. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 18300-18310). IEEE Computer Society.
[CrossRef] [Google Scholar]
Yin, W., Zhang, J., Wang, O., Niklaus, S., Mai, L., Chen, S., & Shen, C. (2021, June). Learning to Recover 3D Scene Shape from a Single Image. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 204-213). IEEE.
[CrossRef] [Google Scholar]
Ramakrishnan, S. K., Gokaslan, A., Wijmans, E., Maksymets, O., Clegg, A., Turner, J., ... & Batra, D. (2021). Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238.
[Google Scholar]
Xie, C., Wang, C., Zhang, B., Yang, H., Chen, D., & Wen, F. (2021, June). Style-based Point Generator with Adversarial Rendering for Point Cloud Completion. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4617-4626). IEEE.
[CrossRef] [Google Scholar]
Qian, R., Lai, X., & Li, X. (2022). 3D object detection for autonomous driving: A survey. Pattern Recognition, 130, 108796.
[CrossRef] [Google Scholar]
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99-106.
[CrossRef] [Google Scholar]
Martin-Brualla, R., Radwan, N., Sajjadi, M. S., Barron, J. T., Dosovitskiy, A., & Duckworth, D. (2021). Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7210-7219).
[CrossRef] [Google Scholar]
Zhao, X., Chen, B., Sun, M., Yang, D., Wang, Y., Zhang, X., ... & Zhang, L. (2024). Hybridocc: Nerf enhanced transformer-based multi-camera 3d occupancy prediction. IEEE Robotics and Automation Letters, 9(9), 7867-7874.
[CrossRef] [Google Scholar]
Kim, M., Seo, S., & Han, B. (2022, June). InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12902-12911). IEEE.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Liu, B., Li, R., Zhou, L., & Zhou, Y. (2025). DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction. ICCK Transactions on Intelligent Systematics, 2(3), 190–202. https://doi.org/10.62762/TIS.2025.874668

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 23

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Intelligent Systematics

ISSN: 3068-5079 (Online) | ISSN: 3069-003X (Print)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies