Developed a Japanese speech synthesis model using multilingual speech data

Development of Japanese speech synthesis model using multilingual speech data
NABLAS Co., Ltd. Press release: October 9, 2024 Dear one Developed a Japanese speech synthesis model using multilingual speech data Speech synthesis possible with just a few seconds of audio data and Japanese text
https://prcdn.freetls.fastly.net/release_image/38634/86/38634-86-caa2668adc9e11aacad71dd40f1603fc-1920×1080.jpg

NABLAS Co., Ltd. (Headquarters: Hongo, Bunkyo-ku, Tokyo;
Representative Director and Director: Kotaro Nakayama; hereinafter referred to as the “Company”), which operates as an AI research institute, is a Japanese text-to-speech synthesis company that maintains the voice quality of multilingual speakers. We have developed a TTS (Text-to-Speech) model that enables this. This model is capable of synthesizing fluent Japanese speech from the speech of speakers of other languages, using a few seconds of speech data regardless of the language. This technology is expected to have applications in a wide range of fields, including interpretation, support for people with speech difficulties, and multilingualization of entertainment works such as movies and videos. ▼You can listen to the audio data here.
https://www.nablas.com/post/voice-synthesis-202410 ■Development background and overview In recent years, the use of various types of speech synthesis has rapidly spread, and the number of situations where speech synthesis is used has increased, such as automatic voice guidance, book reading, and video dubbing. However, conventional speech synthesis requires voice actors and announcers to record predetermined sentences and construct a voice model that reproduces the voice quality based on several minutes of voice data. Furthermore, in the case of Japanese speech synthesis, a speech model created by a Japanese speaker is required, and it has been difficult to synthesize fluent Japanese speech using speech models from other languages. To address these issues, we have built a speech synthesis model that can read out Japanese text from a few seconds of spoken audio data in any language, such as English, Chinese, or Korean. This makes it possible to synthesize speech using words. Furthermore, this model is based on the structure of the sound generation model “SoundStorm” developed by Google, and utilizes a Japanese-compatible speech generation model developed by our company, making it possible to instantly synthesize Japanese speech. Click here for the release regarding the Japanese model of SoundStorm.
https://www.nablas.com/post/japanese-voice-synthesis
https://prcdn.freetls.fastly.net/release_image/38634/86/38634-86-5bffc417a90118a7fd5ac62ffe850e90-1920×1080.jpg ■Example of use of this model ・Support for people with speech difficulties For those who require support for speaking, using your own or arbitrary voice data, input what you want to say as text and output it as voice, which will help eliminate barriers to speaking. ・Use in language learning and interpretation In multilingual learning and interpretation, it is possible to instantly synthesize speech using the speaker’s own voice data. By inputting the translated content into text, the interpreter can speak in Japanese with the same voice quality as the speaker. ・Utilization in the entertainment field When dubbing into Japanese, it is possible to dub into Japanese by the same speaker by using the source audio. This will help reduce the cost of producing content such as media, games, and audio books, and expand the range of creative activities. ■Future outlook Speech synthesis technology is a technology that is expected to be used in various fields. In addition to the development of a Japanese-compatible speech synthesis model using multilingual speech data, the use of speech synthesis and speech generation technologies in Japan is becoming more active, including speech conversion and even faster real-time dialogue translation. We will continue to develop technology to make this possible. We will also continue to work on preventing misuse of these generation AI technologies and developing detection technologies. ■ Contact us For inquiries regarding NABLAS speech synthesis, please feel free to contact us using the form below.
https://www.nablas.com/contact ■About NABLAS Co., Ltd. Our company is a venture from the University of Tokyo, an AI human resources education and development institution, and an AI comprehensive research institute that provides solutions that utilize cutting-edge AI technology, especially deep learning technology. In the AI human resources development business, we provide AI human resources education content developed at the University of Tokyo and content updated by our company together with the iLect System learning environment as an AI human resources development service. In the AI consulting/R&D business, we provide technical consulting services regarding the introduction, research, and development of AI technology, and provide technical services such as the introduction and development of AI technology depending on the client’s situation. Masu. As a company that realizes social implementation of AI technology in various forms and continues to explore and create technologies and services to create a better future, we aim to develop the next generation with the mission of “Discover the gradients, Towards the future.” We will continue to develop supporting
technologies and services. ■Company profile Company name: NABLAS Co., Ltd. Representative: Representative Director and Director Kotaro Nakayama Head Office: 1F Hongo Tsuna Building, 6-17-9 Hongo, Bunkyo-ku, Tokyo Established: March 2017 Business content: AI human resource development business/consulting/research and development URL: https://nablas.com

Leave a Reply