The wealth of information that we extract from the faintest of facial expressions imposes high expectations on the science and art of facial animation. While the advent of high-resolution performance capture has greatly improved the realism of facial animation for film and games, the generality of procedural approaches warrant a prominent place in general facial animation workflow. We present a system that, given an input audio soundtrack and speech transcript, automatically generates expressive lip-synchronized facial animation that is amenable to further artistic refinement, and that is comparable with both performance capture and professional animator output. Because of the wide variation in the use of musculature to produce sound, the mapping from phonemes to a visual manifestation as visemes is inherently many-valued. We draw from psycholinguistics to capture this variation using two visually distinct anatomical actions: Jaw and Lip, where sound is primarily controlled by jaw articulation and lower-face muscles, respectively. We describe the construction of a transferable template JALI 3D facial rig, built upon the popular facial muscle action unit representation FACS. We show that acoustic properties in a speech signal map naturally to the dynamic degree of jaw and lip in visual speech. We demonstrate an array of compelling animation clips, and compare against performance capture and existing procedural animation.
Download our Example Facial Rig for Maya 2016
Download the Paper
Special thanks are due to our actors Patrice Goodman and Adrien Yearwood. We have benefited considerably from discussions with Gérard Bailly and Dominic Massaro. The financial support of the Natural Sciences and Engineering Research Council of Canada, the Canada Foundation for Innovation, and the Ontario Research Fund, is gratefully acknowledged.