Record Once, Post Everywhere: Automatic Shortening of Audio Stories for Social Media

Bryan Wang, Zeyu Jin, Gautham J. Mysore. In the Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’22).

Following the prevalence of short-form video, short-form voice content has emerged on social media platforms like Twitter and Facebook. A challenge that creators face is hard constraints on the content length. If the initial recording is not short enough, they need to re-record or edit their content. Both are time-consuming, and the latter, if supported, can have a learning curve. Moreover, creators need to manually create multiple versions to publish content on platforms with different length constraints. To simplify this process, we present ROPE (Record Once, Post Everywhere). Creators can record voice content once, and our system will automatically shorten it to all length limits by removing parts of the recording for each target. We formulate this as a combinatorial optimization problem and propose a novel algorithm that automatically selects optimal sentence combinations from the original content to comply with each length constraint. Creators can customize the algorithmically shortened content by specifying sentences to include or exclude. Our system can also use the user-specified constraints to recompute and provides a new version.

    Details coming soon...

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Bryan Wang, Gang Li, Xin Zhou, Zhourong Chen, Tovi Grossman, Yang Li. In the Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’21).

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across ~22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

Soloist: Generating Mixed-Initiative Tutorials from Existing Guitar Instructional Videos Through Audio Processing

Bryan Wang, Mengyu Yang, Tovi Grossman. In Proceesings of the ACM Conference on Human Factors in Computing Systems (CHI '21).

We designed Soloist, a mixed-initiative learning framework that automatically generates customizable curriculums from off-the-shelf guitar video lessons. Soloist takes raw videos as input and leverages deep-learning based audio processing to extract musical information. This back-end processing is used to provide an interactive visualization to support effective video navigation and real-time feedback on the user's performance, creating a guided learning experience. We demonstrate the capabilities and specific use-cases of Soloist within the domain of learning electric guitar solos using instructional YouTube videos. A remote user study, conducted to gather feedback from guitar players, shows encouraging results as the users unanimously preferred learning with Soloist over unconverted instructional videos.

BlyncSync: Enabling Multimodal Smartwatch Gestures with Synchronous Touch and Blink

Bryan Wang, Tovi Grossman. In Proceesings of the ACM Conference on Human Factors in Computing Systems (CHI '20).

We present BlyncSync, a novel multi-modal gesture set that leverages the synchronicity of touch and blink events to augment the input vocabulary of smartwatches with a rapid gesture, while at the same time, offers a solution to the false activation problem of blink-based input. BlyncSync contributes the concept of a mutual delimiter, where two modalities are used to jointly delimit the intention of each other's input. A study shows that BlyncSync is 33% faster than using a baseline input delimiter (physical smartwatch button), with only 150ms in overhead cost compared to traditional touch events. Furthermore, our data indicates that the gesture can be tuned to elicit a true positive rate of 97% and a false positive rate of 1.68%.

PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Bryan Wang, Yi-Hsuan Yang. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’19). Oral Presentation (acceptance rate: 6.4%)

We propose PerformanceNet, a deep convolutional model that learns in an end-to-end manner the score-to-audio mapping between a symbolic representation of music called the piano rolls and an audio representation of music called the spectrograms. The model consists of two subnets: the ContourNet, which uses a U-Net structure to learn the correspondence between piano rolls and spectrograms and to give an initial result; and the TextureNet, which further uses a multi-band residual network to refine the result by adding the spectral texture of overtones and timbre.

ActiveErgo: Automatic and Personalized Ergonomics using Self-actuating Furniture

Yu-Chian Wu, Te-Yen Wu, Paul Taele, Bryan Wang, Jun-You Liu, PO-EN LAI, Pin-sung Ku, Mike Y. Chen In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’18).

We present ActiveErgo, the first active approach to improving ergonomics by combining sensing and actuation of motorized furniture. Our prototype system uses Kinect for skeletal sensing and monitoring to determine the ideal furniture positions for each user, then uses a combination of automatic adjustment and live feedback to adjust the computer monitor, desk, and chair positions.

CircuitSense: Automatic Sensing of Physical Circuits and Generation of Virtual Circuits to Support Software Tools

Te-Yen Wu, Bryan Wang, Jiun-Yu Lee, Hao-Ping Shen, Yu-Chian Wu, Yu-An Chen, Pin-Sung Ku, Ming-Wei Hsu, Yu-Chih Lin, Mike Y. Chen. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’17).

We present CircuitSense, a system that automatically recognizes the wires and electronic components placed on breadboards. It uses a combination of passive sensing and active probing to detect and generate the corresponding circuit representation in software in real-time. It also dramatically simplifies the sharing of circuit designs with online communities.

CircuitStack: Supporting Rapid Prototyping and Evolution of Electronic Circuits

Chiuan Wang, Hsuan-Ming Yeh, Bryan Wang , Te-Yen Wu, Hsin-Ruey Tsai, Rong-Hao Liang, Yi-Ping Hung, Mike Y. Chen. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’16). Best Talk Award

We present CircuitStack, a system that combines the flexibility of breadboarding with the correctness of printed circuits, for enabling rapid and extensible circuit construction. This hybrid system enables circuit reconfigurability, component reusability, and high efficiency at the early stage of prototyping development.