Hizlari-bektore manipulazioaren bidezko genero-anbiguoko hizketaren sintesia euskaraz

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published 24-09-2024
Xabier Sarasola Ander Corral Igor Leturia Iñigo Morcillo

Abstract

There is a growing interest in text-to-speech (TTS) systems with gender-ambiguous voices, among other things due to their potential to avoid gender biases and stereotypes in voice assistants and smart speakers. In this paper we present and evaluate some novel methods that apply voice morphing techniques to speaker embeddings in order to obtain neural network-based gender-ambiguous voiced TTS systems for the Basque language. The speaker embeddings are obtained training a multi-speaker Tacotron 2. We compare the performance of systems with and without speaker embedding normalization with a scaling parameter, and also the application of these systems to the average embeddings of each gender and to real voice embeddings. The results prove that the methods presented are valid to obtain gender-ambiguous voices with acceptable, albeit improvable, quality.

Abstract 48 | PDF (Euskara) Downloads 15

##plugins.themes.bootstrap3.article.details##

Keywords

speech synthesis, gender-ambiguous voice, speaker embeddings, voice morphing, Basque

Section
Ale Berezia