Xiaomi recently unveiled its latest implementation of advanced algorithms and self-developed audio technology in the field of accessibility. Spontaneous-style text-to-speech technology, developed by Xiaomi AI Lab, is used to generate a unique and personalized voice for a user with a speech disability.
The user can now communicate with others using their ‘voice’, instead of the normal electronic voice. As part of the “Own My Voice” pre-investigation project led by the Xiaomi Technical Committee, this successful attempt demonstrates Xiaomi’s commitment to “technology for good” and the fulfillment of its mission to “allow everyone in the world to lead a better life through innovative technology”.
Why did Xiaomi launch this project?
Xiaomi cares about people and strives to meet their needs through technological innovation. She discovered the desire of many users with a speech disability to have their own unique voices for everyday communication and created the “Own My Voice” project team to invite a user with a speech disorder to be the audio receiver. “We are excited to explore the multiple values that technological innovation brings us, such as responding to users’ demands for their identity and building their identity,” said Zhu Xi, Head of the Technology Committee on “Technology for Good” at Xiaomi.
How did Xiaomi implement the project?
In order to generate the most appropriate and personalized voice for the recipient, the team has recruited more than 200 volunteers within Xiaomi to donate their voices. They used a voice matching algorithm to match the characteristics of the voices donated by the volunteers with the voices of the recipient. Through this approach, they find the most suitable sound as the primary reference sound for the receiver. With personalization and privacy protection in mind, the selected real sound has been manipulated through intricate acoustic modifications to create a new and original acoustic sound.
Then they used the automatically designed text-to-speech technology to train the AI model, making this new voice gradually acquire a natural rhythm and tone capable of truly expressing human emotion and tone.
The “Own My Voice” project combines a variety of the most advanced algorithms with Xiaomi’s self-developed speech technology to ensure the privacy, security and high fidelity of the synthesized voice, creating a new idea of personalized speech synthesis for users with speech. disturbances.
What is the meaning of the project?
The backbone of this project is a group of speech technology experts from Xiaomi AI Lab. Since 2017, they have published 37 theses on speech in the proceedings of major international conferences such as the International Conference on Voice, Speech and Signal Processing (ICASSP). The success of “Own My Voice” is mainly based on the automatically designed text-to-speech technology they developed.
Spontaneous style text-to-speech technology makes the voice synthesized like a real human in tone, pause, speed and other characteristics. This replaces the monotonous and unnatural sensation of electronic sound with a more natural sound. This technology is currently applicable to many smart devices equipped with Xiaoai, Xiaomi’s AI voice assistant. The “Own My Voice” project demonstrates that automated text-to-speech technology can also be widely adopted in the areas of accessibility and user experience improvement.
Zhu Xi added, “If we notice and address the needs of minorities at an early stage, the process of technology deployment can be greatly shortened. This allows the advantages of new technologies to become accessible to users with disabilities without delay.”
From now on, Xiaomi will continue to receive feedback from audio receivers and will continue to study the feasibility of this project in a larger scale. Xiaomi will continue to enable accessibility with the latest technologies, and strive to meet people’s diverse needs through technological innovation.