Volume 1, Issue 2, Next-Generation Computing Systems and Technologies
Volume 1, Issue 2, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
Next-Generation Computing Systems and Technologies, Volume 1, Issue 2, 2025: 102-112

Open Access | Review Article | 22 December 2025
A Comprehensive Review of Diffusion Models, Gaussian Splatting and Their Integration in Augmented and Virtual Reality
1 Department of Computer Science & Engineering, NIST University, Berhampur 761008, India
* Corresponding Author: Santosh Kumar Kar, [email protected]
ARK: ark:/57805/ngcst.2025.477710
Received: 05 September 2025, Accepted: 15 December 2025, Published: 22 December 2025  
Abstract
The new progress in text-to-3D technology has greatly changed and improved the artificial intelligence (AI) applications in augmented and virtual reality (AR/VR) environments. Many different techniques in 2024-2025 like diffusion models, Gaussian splatting, and physics aware models have helped the text-to-3D much better by improving the visual fidelity, semantic coherence, and generation efficiency. Some models like Turbo3D, Dive3D and Instant3D are deigned to make the 3D generation faster by improving the working process of diffusion models. Other frameworks like LAYOUTDREAMER, PhiP-G and CompGS focus on creating scenes that are well organized and structured. Dream Reward and Coheren Dream methods use the feedback from the humans and information from multiple types of data ton improve the 3D results that will match with the expectation of the people. There are some major challenges still remain even with all these improvements. These can be current text-to-3D methods need a lot of computing power which makes it difficult to employ at large scale or in real time AR/VR applications. Other problems like multi-view inconsistencies and absence of any standard benchmark makes it very difficult to compare the methods fairly. Without combining text, physics, and spatial logic the 3D scenes look less real and difficult to achieve natural interactions with the objects. This review explains and examines the latest advancements in text-to-3D generation. It closely looks at how these methods are designed, optimized and customized for different areas of applications. The review points out probable future research ideas like creation of faster and smaller 3D generation methods, renderings that will understand the real world physics and include the human help to guide the model as per the requirements in the process and use common standards to get fairness in the evaluation of the model. The study bows to explain the current progress, innovative ideas and the challenges faced by the artificial intelligence (AI) in creating AR/VR 3D contents.

Graphical Abstract
A Comprehensive Review of Diffusion Models, Gaussian Splatting and Their Integration in Augmented and Virtual Reality

Keywords
text-to-3D generation
diffusion models
gaussian splatting
augmented and virtual reality (AR/VR)
human-in-the-loop optimization

Data Availability Statement
The implementation code for the text-to-3D generation algorithm is available at: https://github.com/ujalesh-1/AR-VR-implemented-Code/blob/main/AR_VR_ReV_1.ipynb (accessed on 21 December 2025).

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Do, K., & Hua, B. S. (2025). Text-to-3D Generation using Jensen-Shannon Score Distillation. arXiv preprint arXiv:2503.10660.
    [Google Scholar]
  2. Yan, R., Chen, Y., & Wang, X. (2025). Consistent flow distillation for text-to-3d generation. arXiv preprint arXiv:2501.05445.
    [Google Scholar]
  3. Ma, Z., Liang, X., Wu, R., Zhu, X., Lei, Z., & Zhang, L. (2025). Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 11036-11050).
    [Google Scholar]
  4. Behravan, M., & Gračanin, D. (2025, March). From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality. In 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 150-155). IEEE.
    [CrossRef]   [Google Scholar]
  5. Li, Q., Wang, C., He, Z., & Peng, Y. (2025). PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation. arXiv preprint arXiv:2502.00708.
    [Google Scholar]
  6. Laguna, S., Garcia-Garcia, A., Rakotosaona, M. J., Moschoglou, S., Helminger, L., & Orts-Escolano, S. (2025). Text To 3D Object Generation For Scalable Room Assembly. arXiv preprint arXiv:2504.09328.
    [Google Scholar]
  7. Bai, W., Li, Y., Chen, W., Luo, W., & Sun, H. (2025). Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching. arXiv preprint arXiv:2506.13594.
    [Google Scholar]
  8. Qin, Y., Xu, Z., & Liu, Y. (2025). Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 18521-18530).
    [Google Scholar]
  9. Zhu, J., Chen, Z., Wang, G., Xie, X., & Zhou, Y. (2025). SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 15864-15874).
    [Google Scholar]
  10. Behravan, M. (2025). Generative ai framework for 3d object generation in augmented reality. arXiv preprint arXiv:2502.15869.
    [Google Scholar]
  11. Hu, H., Yin, T., Luan, F., Hu, Y., Tan, H., Xu, Z., ... & Zhang, K. (2025). Turbo3d: Ultra-fast text-to-3d generation. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 23668-23678).
    [Google Scholar]
  12. Ge, C., Xu, C., Ji, Y., Peng, C., Tomizuka, M., Luo, P., ... & Zhan, W. (2025). Compgs: Unleashing 2d compositionality for compositional text-to-3d via dynamically optimizing 3d gaussians. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 18509-18520).
    [Google Scholar]
  13. Zhou, Y., He, Z., Li, Q., & Wang, C. (2025). Layoutdreamer: Physics-guided layout for text-to-3d compositional scene generation. arXiv preprint arXiv:2502.01949.
    [Google Scholar]
  14. Jiang, C., Zeng, Y., & Yeung, D. Y. (2025). CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback. arXiv preprint arXiv:2504.19860.
    [Google Scholar]
  15. Zang, Y., Han, Y., Ding, C., Zhang, J., & Chen, T. (2024). Magic3dsketch: Create colorful 3d models from sketch-based 3d modeling guided by text and language-image pre-training. arXiv preprint arXiv:2407.19225.
    [Google Scholar]
  16. Ye, J., Liu, F., Li, Q., Wang, Z., Wang, Y., Wang, X., ... & Zhu, J. (2024, September). Dreamreward: Text-to-3d generation with human preference. In European Conference on Computer Vision (pp. 259-276). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  17. Jiang, C. (2024). A survey on text-to-3d contents generation in the wild. arXiv preprint arXiv:2405.09431.
    [Google Scholar]
  18. Li, H., Tian, Y., Wang, Y., Liao, Y., Wang, L., Wang, Y., & Zhou, P. Y. (2024). Text-to-3D Generation by 2D Editing. arXiv preprint arXiv:2412.05929.
    [Google Scholar]
  19. Yang, Y., Shao, J., Li, X., Shen, Y., Geiger, A., & Liao, Y. (2025). Prometheus: 3d-aware latent diffusion models for feed-forward text-to-3d scene generation. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 2857-2869).
    [Google Scholar]
  20. Kompanowski, H., & Hua, B. S. (2025, March). Dream-in-style: Text-to-3d generation using stylized score distillation. In 2025 International Conference on 3D Vision (3DV) (pp. 915-925). IEEE.
    [CrossRef]   [Google Scholar]
  21. Tang, J., Ren, J., Zhou, H., Liu, Z., & Zeng, G. (2023). Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653.
    [Google Scholar]
  22. Chen, D. Z., Siddiqui, Y., Lee, H. Y., Tulyakov, S., & Nießner, M. (2023, October). Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 18512-18522). IEEE.
    [CrossRef]   [Google Scholar]
  23. Yi, T., Fang, J., Wang, J., Wu, G., Xie, L., Zhang, X., ... & Wang, X. (2024, June). GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6796-6807). IEEE.
    [CrossRef]   [Google Scholar]
  24. Wang, C., Peng, H. Y., Liu, Y. T., Gu, J., & Hu, S. M. (2025). Diffusion models for 3D generation: A survey. Computational Visual Media, 11(1), 1-28.
    [CrossRef]   [Google Scholar]
  25. Yang, H., Chen, Y., Pan, Y., Yao, T., Chen, Z., Wu, Z., ... & Mei, T. (2024, September). Dreammesh: Jointly manipulating and texturing triangle meshes for text-to-3d generation. In European Conference on Computer Vision (pp. 162-178). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  26. Chen, Y., Pant, Y., Yang, H., Yao, T., & Meit, T. (2024, June). VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4896-4905). IEEE.
    [CrossRef]   [Google Scholar]
  27. Li, Z., Chen, Y., Zhao, L., & Liu, P. (2025, March). Controllable text-to-3D generation via surface-aligned Gaussian splatting. In 2025 International Conference on 3D Vision (3DV) (pp. 1113-1123). IEEE.
    [CrossRef]   [Google Scholar]
  28. Chen, C., Yang, X., Yang, F., Feng, C., Fu, Z., Foo, C. S., ... & Liu, F. (2024, June). Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10228-10237). IEEE.
    [CrossRef]   [Google Scholar]
  29. Zhang, Y., Zhang, M., Wu, T., Wang, T., Wetzstein, G., Lin, D., & Liu, Z. (2025). 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models. arXiv preprint arXiv:2503.21745.
    [CrossRef]   [Google Scholar]
  30. Bono, G. (2024). Text-to-building: experiments with AI-generated 3D geometry for building design and structure generation. Architectural Intelligence, 3(1), 24.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Kar, S. K., Subudhi, B. U., Mishra, B. K., & Panda, B. (2025). A Comprehensive Review of Diffusion Models, Gaussian Splatting and Their Integration in Augmented and Virtual Reality. Next-Generation Computing Systems and Technologies, 1(2), 102–112. https://doi.org/10.62762/NGCST.2025.477710
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Kar, Santosh Kumar
AU  - Subudhi, B. Ujalesh
AU  - Mishra, Brojo Kishore
AU  - Panda, Bandhan
PY  - 2025
DA  - 2025/12/22
TI  - A Comprehensive Review of Diffusion Models, Gaussian Splatting and Their Integration in Augmented and Virtual Reality
JO  - Next-Generation Computing Systems and Technologies
T2  - Next-Generation Computing Systems and Technologies
JF  - Next-Generation Computing Systems and Technologies
VL  - 1
IS  - 2
SP  - 102
EP  - 112
DO  - 10.62762/NGCST.2025.477710
UR  - https://www.icck.org/article/abs/NGCST.2025.477710
KW  - text-to-3D generation
KW  - diffusion models
KW  - gaussian splatting
KW  - augmented and virtual reality (AR/VR)
KW  - human-in-the-loop optimization
AB  - The new progress in text-to-3D technology has greatly changed and improved the artificial intelligence (AI) applications in augmented and virtual reality (AR/VR) environments. Many different techniques in 2024-2025 like diffusion models, Gaussian splatting, and physics aware models have helped the text-to-3D much better by improving the visual fidelity, semantic coherence, and generation efficiency. Some models like Turbo3D, Dive3D and Instant3D are deigned to make the 3D generation faster by improving the working process of diffusion models. Other frameworks like LAYOUTDREAMER, PhiP-G and CompGS focus on creating scenes that are well organized and structured. Dream Reward and Coheren Dream methods use the feedback from the humans and information from multiple types of data ton improve the 3D results that will match with the expectation of the people. There are some major challenges still remain even with all these improvements. These can be current text-to-3D methods need a lot of computing power which makes it difficult to employ at large scale or in real time AR/VR applications. Other problems like multi-view inconsistencies and absence of any standard benchmark makes it very difficult to compare the methods fairly. Without combining text, physics, and spatial logic the 3D scenes look less real and difficult to achieve natural interactions with the objects. This review explains and examines the latest advancements in text-to-3D generation. It closely looks at how these methods are designed, optimized and customized for different areas of applications. The review points out probable future research ideas like creation of faster and smaller 3D generation methods, renderings that will understand the real world physics and include the human help to guide the model as per the requirements in the process and use common standards to get fairness in the evaluation of the model. The study bows to explain the current progress, innovative ideas and the challenges faced by the artificial intelligence (AI) in creating AR/VR 3D contents.
SN  - 3070-3328
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Kar2025A,
  author = {Santosh Kumar Kar and B. Ujalesh Subudhi and Brojo Kishore Mishra and Bandhan Panda},
  title = {A Comprehensive Review of Diffusion Models, Gaussian Splatting and Their Integration in Augmented and Virtual Reality},
  journal = {Next-Generation Computing Systems and Technologies},
  year = {2025},
  volume = {1},
  number = {2},
  pages = {102-112},
  doi = {10.62762/NGCST.2025.477710},
  url = {https://www.icck.org/article/abs/NGCST.2025.477710},
  abstract = {The new progress in text-to-3D technology has greatly changed and improved the artificial intelligence (AI) applications in augmented and virtual reality (AR/VR) environments. Many different techniques in 2024-2025 like diffusion models, Gaussian splatting, and physics aware models have helped the text-to-3D much better by improving the visual fidelity, semantic coherence, and generation efficiency. Some models like Turbo3D, Dive3D and Instant3D are deigned to make the 3D generation faster by improving the working process of diffusion models. Other frameworks like LAYOUTDREAMER, PhiP-G and CompGS focus on creating scenes that are well organized and structured. Dream Reward and Coheren Dream methods use the feedback from the humans and information from multiple types of data ton improve the 3D results that will match with the expectation of the people. There are some major challenges still remain even with all these improvements. These can be current text-to-3D methods need a lot of computing power which makes it difficult to employ at large scale or in real time AR/VR applications. Other problems like multi-view inconsistencies and absence of any standard benchmark makes it very difficult to compare the methods fairly. Without combining text, physics, and spatial logic the 3D scenes look less real and difficult to achieve natural interactions with the objects. This review explains and examines the latest advancements in text-to-3D generation. It closely looks at how these methods are designed, optimized and customized for different areas of applications. The review points out probable future research ideas like creation of faster and smaller 3D generation methods, renderings that will understand the real world physics and include the human help to guide the model as per the requirements in the process and use common standards to get fairness in the evaluation of the model. The study bows to explain the current progress, innovative ideas and the challenges faced by the artificial intelligence (AI) in creating AR/VR 3D contents.},
  keywords = {text-to-3D generation, diffusion models, gaussian splatting, augmented and virtual reality (AR/VR), human-in-the-loop optimization},
  issn = {3070-3328},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 1117
PDF Downloads: 166

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Next-Generation Computing Systems and Technologies

Next-Generation Computing Systems and Technologies

ISSN: 3070-3328 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/