VOCAL: Vowel and Consonant Layering for Expressive Animator-Centric Singing Animation SIGGRAPH Asia 2022


JALI Research Inc., University of Toronto

VOCAL is a vowel-consonant layered approach to expressive singing animation: Input audio and lyrics (a) are processed to produce a phonetic alignment (b). We define Melodic accent Ma and Pitch sensitivity Ps parameters, that can be configured to capture a range of singing styles (c). We detect and modify vowels that are sung differently to their transcription (d) and generate vowel animation curves that carry the melody, layered with consonant curves for lyrical clarity and rhythmic emphasis (e). Our output is an audio-driven, lower face animation (f).


Singing and speaking are two fundamental forms of human communication. While both activities have distinct uses, from a modeling perspective, speaking can be seen as a subset of singing. We present VOCAL, a system that automatically generates expressive, animator-centric lower face animation from singing audio input. Articulatory phonetics and voice instruction ascribe different roles to vowels (projecting melody and volume) and consonants (lyrical clarity and rhythmic emphasis). Our approach directly uses these insights to define axes for Melodic accent and Pitch-sensitivity (Ma-Ps), which together with Ja-Li axes for Jaw and Lip contribution, define a 4D space into which a variety of singing styles can be readily embedded. We train a network to learn audio features in a sung signal to map to the dynamic visual contributions of Ma-Ps-Ja-Li. The viseme animation curves are then computed based on aligned lyrics and the 4D vocal space. in our system. Vowels are processed first, dilated from their spoken behavior to bleed into each other based on melodic accent (Ma), with pitch sensitivity (Ps) modeling vibrato. Consonant curves are then layered in, weighted inversely with Ma. We evaluate the impact of our algorithmic parameters, compare against prior art on spoken and sung performance, and provide a qualitative comparison to video references for gallery of singing animations.



Supplemental Video