MobileHCI 2017 Course

Speech-Based Interaction

Outline and learning objectives

  • How Automatic Speech Recognition (ASR) and Speech Synthesis (or Text-To-Speech – TTS) work and why these are such computationally-difficult problems

  • Where are ASR and TTS used in current commercial interactive applications

  • What are the usability issues surrounding speech-based interaction systems, particularly in mobile and pervasive computing

  • What are the challenges in enabling speech as a modality for mobile interaction

  • What is the current state-of-the-art in ASR and TTS research

  • What are the differences between the commercial ASR systems' accuracy claims and the needs of mobile interactive applications

  • What are the difficulties in evaluating the quality of TTS systems, particularly from a usability and user perspective

  • What opportunities exist for HCI researchers in terms of enhancing systems' interactivity by enabling speech

Recent updates for 2017

A new sub-topic that was developed for the presentation at CHI 2016 and 2017 is interactive speech-based applications centred around language translation, language learning support, and interacting across multiple languages. This will be updated and expanded for the MobileHCI 2017 tutorial. Recent advances in Deep Neural Networks have dramatically improved the processing accuracy of speech recognition systems; however, this requires powerful computational resources not available to all developers – we will discuss and engage the audience in an analysis of its implications for the design of interactive systems. Additionally, even the most capable computation servers continue to struggle when acoustic, language, or interaction environments are adverse, resulting in large variations in the accuracy of processing speech – this is particularly relevant for home-based smart personal assistants such as Amazon Echo where unexpected interaction contexts (e.g. loud music) can negatively impact performance and thus user experience. The 2017 course materials will include further discussion and analysis of such examples.

Hands-on activities

The course includes three interactive, hands-on activities. The first activity will engage participants in proposing design alternatives for the error-handling interaction of a smartphone's voice-based search assistant, based on an empirical assessment of the type of ASR errors exhibited (e.g. acoustic, language, semantic). For the second activity, participants will conduct an evaluation of the quality of the synthetic speech output typically employed in mobile-based speech interfaces, and propose alternate evaluation methods that better reflect the mobile user experience. NEW FOR 2017: The third activity will center around uncovering speech processing errors of a home-based personal assistant and designing interactions that maintain a positive user experience in the face of unexpected variations in speech processing accuracy.