GeoPython 2023

Introduction to depth-based stable diffusion for Blender using Dream Textures
2023-03-06, 11:45–12:15, Auditorium

The field of GAN based generation of text has been extended to images with stable diffusion and even 3D with NVIDIA's GET3D. I'll present the newest findings and some possible applications in BIM and GEO-Visualisation


stable diffusion

I present an existing technique applied to relevant fields
Stable diffusion is based on many recently developed techniques such as auto-encoders, generative-adversarial networks and natural language processing. Those fields have all recently made significant advances, allowing for application to many fields including 3D Models and maps.
The pipeline consists of a text encoder using the CLIP tokenizer to understand the tokens in the text and generate a starting point of mostly noise for the later generators.
The first generator and diffuser is a UNet and Scheduler, which inpaints the noise until a meaningful pile of data is formed.
The Image decoder then decodes that information into actual pixel values.

Dream Textures

Using the stable-diffusion-2-depth model from huggingface, carson implemented a blender plugin, allowing depth based image generation on any 3D surface using a simple prompt. Here we use other techniques like in- and outpainting to stitch surfaces together, change existing textures and extend on already existing ones.
This model has an additional input channel compared to stable-diffusion-2 for the depth-parameter. This uses the viewport distance of the surfaces in Blender. Alternatively from an existing image the depths can be extracted using MiDaS (dpt_hybrid). This is a first step towards photo-grammetry aka model generation from images.

Application

Some examples are shown displaying the workflow and possibilities for rapid prototyping and texturing of 3D (IFC) Models and Landscapes. The plugin also offers the ability to fine-tune existing or generated textures towards an input phrase.
Strengths of this method include deployment speed, minimal effort, adaptability. As is customary with new AI art, the images might not fit the textures, have various artifact, need multiple generation attempts to find a satisfying image and do require quite some hardware resources. This adds to the resources already required by blender et al.

Disclaimer

All models/weights are for research purposes only