Beyond Direct Geometry: Spring-Mass Control of Tongue Articulation for Vocal Synthesis

Debasish Mohapatra; Ziyi Xia; Sidney Fels

Beyond Direct Geometry: Spring-Mass Control of Tongue Articulation for Vocal Synthesis
Image credit: Debasish Mohapatra; Ziyi Xia; Sidney Fels

Abstract:

Human speech production relies on tightly coupled neuromuscular control of articulators and the aeroacoustic properties of the vocal tract. Vocal synthesizers employing direct geometric control of articulatory positions often struggle to generate smooth nonlinear trajectories between target vowels, as required for diphthong synthesis. We propose a biomechanically inspired control approach using a lightweight spring–mass–damper framework coupled to an acoustic wave solver, in which spring forces are parameterized to generate target tongue shapes. This physics-based interface enables synthesis through an input modality analogous to natural muscle activation. We conducted a pilot study comparing the proposed physics-based controller with a conventional geometry-driven controller on identical trajectory-generation tasks, subsequently coupling both to a vocal synthesizer. The pilot study served to refine the experimental design and verify that the system captures meaningful differences between the two controllers. Results revealed large, observable differences in the ability of each controller to generate nonlinear articulatory trajectories, both quantitatively and qualitatively. These findings support a planned controlled user study with a larger and more diverse participant pool, aimed at providing statistically valid assessments of the proposed controller’s effectiveness for smooth trajectory generation.