Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design

Lucy Strauss; Prashanth Thattai Ravikumar; Matthew Yee-King

Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design
Image credit: Lucy Strauss; Prashanth Thattai Ravikumar; Matthew Yee-King

Abstract:

NIME researchers frequently work with sensor signals that lack interpretability, such as signals from movement sensors and bioelectric sensors. However, there is a lack of NIME-specific approaches for building and evaluating deep generative models (DGM) of such signals, even though DGM are increasingly prevalent in NIME.

Our research focuses on cross-modal Sig2Sig machine translation, a sensor-sound mapping task using DGM. We present the Muscle-Listening Machine Learning Model for Live Music (MLMLMLM), a novel DGM intended for use within an interactive music system. MLMLMLM is trained on a bespoke time-aligned dataset of audio and electromyographic (EMG) signals and features a decoder-only Transformer and two RVQ-VAEs.

We position the technical work of designing bespoke DGM architectures as a NIME practice in its own right and employ a Technical Practice Research (TPR) approach to document the process of building MLMLMLM. Through our TPR process, a new evaluation method emerged for DGM with low-interpretability signals.

The contributions of this research are two-fold: 1) a novel DGM architecture for EMG-conditioned sequence generation of audio signals; 2) a method for more effectively developing and evaluating DGMs of multi-channel time-domain signals with low-interpretability.