This work presents a novel approach to large-scale 3D environment reconstruction using generative AI, building upon previous research on Spatio-Temporal Diffusion architectures and SDF_MIP representations. We introduce the Neural-Clipmap algorithm, designed to improve scalability and efficiency by dynamically refining scene details through a hierarchical structure. Additionally, we explore the use of Score Distillation Sampling (SDS), Inpainting Diffusion model and Gaussian Splatting for image-driven 3D generation, addressing challenges such as multi-view consistency and color correction. These advancements contribute to more accessible and high-quality 3D content creation, fostering new possibilities in XR applications.