Real Time Diffusion in TouchDesigner
With the intention of creating a unique and memorable experience for someone in an exhibit, I began prototyping a system using real time AI animation. The pipeline uses an accelerated stable diffusion model called Stream Diffusion which is ran in TouchDesigner. A frame rate of circa 15 fps can be achieved due to the model’s ability to sequentially denoise certain operations into batched processes. A smoothing interpolation is also applied to reduce the appearance of choppiness. In essence, a video input from a webcam and a text prompt are used together to create the following real time animations.
ControlNet
To introduce more control, extra conditions were set by passing the threshold and edge nodes into the ControlNet model. The goal is to expose the most prominent variables i.e. the number of denoise steps, the ControlNet weight and the noise seed, then assign them to a midi controller. In an exhibit, you could therefore have a keyboard with a simple midi controller with a camera. The user could then type their own text prompt, strike any pose they wish and have the power to control the video output as they desire, all in real time. I feel this would have a long-lasting, memorable and deeply personal impact on the viewer as they can create their own experience.
Architectural Prototyping Tool
By arranging a set of small 3D-printed building blocks into a basic building structure, the pipeline enables the creation of architectural designs. The intention was to empower clients, or people in an exhibition, to create their own miniature buildings using their own hands in physical space. They could then use text prompts and the previously mentioned control logic to craft their own architectural visions. This approach bridges physical and digital realms, fostering a deeper connection for users by freeing them from the confines of a digital screen. It also serves as an endless source of inspiration at the earlier stages of design. As the building blocks are similar to Lego bricks, I feel it would also be highly engaging for children to develop a keen interest in architecture. Conversely, as the blocks have a sleek and contemporary style, adults will also find them intriguing to play with, therefore making the exhibit applicable to a wide spectrum of people.
The above and below clips specify a Saudi Arabian aesthetic and show how effective changing the number of denoise steps and the noise seed is. At the time of recording these clips, an appropriate tripod wasn’t available, hence why the video is quite unstable. In a final exhibition setup, the final output would be far more stable as the camera would be securely mounted in a fixed position.
The final clip showcases a more abstract text prompt, specifying an M.C. Escher-related output. As he was a graphic artist, not an architect, most of his architectural prints are of interiors, therefore much greater prompt engineering was required for Stream Diffusion to generate the type of imagery intended. Further adjectives such as ‘mind-bending’ and ‘distorted’ were added to the text prompt to yield more appropriate results. This is a point that needs careful consideration, knowing that some text prompts are far more successful than others. In an exhibition setting, adequate instructions would be needed to guide people on how to attain their desired output.