The National Science Foundation (NSF) awarded Dr. Berrak Sisman $563,693 for the Faculty Early Career Development (CAREER) award titled, CAREER: What is in a Voice?: Scientific and Machine Learning Advancement for Voice Conversion. The primary objective of this project is to create new algorithms that use deep learning approaches to improve voice conversion models that can capture the traits and emotions of individual speech. This project addresses the theoretical and practical issues raised by previous research on voice conversation models. These include investigating speaker identity and emotion representations for robust voice conversion with self-supervision, investigating voice conversion solutions for difficult situations like noisy environments, emotional speakers, and limited training to improve the expressiveness and naturalness of the converted speech, and investigating new deep learning techniques for the detection of synthetic voices and joint training strategies to improve voice conversion performance and evaluation.
The objectives of this project will be achieved through exploring shared representations and gaining a better understanding of how speaker traits and emotions can be successfully changed. With a) precise and expressive voice-based applications and b) using the same approaches to identify whether speech is synthetic or natural to prevent spoofing, the project’s outcomes could have a significant impact on society. The results from Dr. Sisman’s project will also contribute to the field of Computer Science.