Probing Latent Space Interactions with Real-time Generative Audio Models Through a Physical Controller

Domenico Stefani; Francesco Dal Rì; Luca Turchet

oral
Paper PDF link
Presence: in person
Duration: 13
Type: medium
Session: Toolkits and Code

Abstract:

Recent advances have made deep generative models practical for live music performance. Moreover, latent spaces are increasingly exposed as sites of musical interaction. However, their use as control substrates remains overall under-explored as a first-class design problem. We propose a tangible, reconfigurable controller, conceived as a design probe for investigating the tradeoff between exploration and reproducibility affordances across different mapping and visual feedback strategies. The system is demonstrated through three use cases for as many representative models for real-time synthesis of audio or symbolic music: RAVE, for neural synthesis with high-dimensional opaque latents; MT-GEN_DDSP, exploiting a known latent space with labeled timbre cluster; and GrooveTransformer, for semantically-structured rhythm pattern generation. Each use-case employs distinct sensor-to-latent mappings and visual feedback approaches tailored to the model’s characteristics, ranging from 3D latent traversal with interaction heatmaps, to semantic dimension mapping with timbre cluster visualization, to descriptor-guided space navigation. By treating latent spaces as explicit interaction surfaces, rather than implementation details, this work contributes to ongoing discussions about controllability, legibility, and appropriation in machine learning-based musical instruments.