Developed an ultra-high-speed Japanese speech generation model *NABLAS Co., Ltd.*
Press release: August 13, 2024
**
Developed an ultra-high-speed Japanese speech generation model *Realizes Japanese voice generation from a few seconds of data* NABLAS Co., Ltd. (Headquarters: Hongo, Bunkyo-ku, Tokyo,
Representative Director and Director: Kotaro Nakayama,
(hereinafter referred to as the “Company”) has developed an
ultra-high-speed speech generation model compatible with Japanese based on the structure of the speech generation model “SoundStorm” developed by Google. This model can instantly generate Japanese speech using several seconds of data. We trained the model using our proprietary Japanese dataset and achieved natural Japanese speech generation. In the future, this technology is expected to be applied in a wide range of fields, including support for people with speech difficulties in the medical field, real-time tone adjustment of emotional voices in customer support, and speech generation in the entertainment field.
▼You can listen to the generated audio data here.
https://www.nablas.com/post/japanese-voice-synthesis
* ■About “SoundStorm”*
This is the most advanced speech generation model developed by Google. It has dramatically improved performance compared to conventional voice generation models, and is capable of high-speed, high-quality voice generation, generating realistic-looking voice in just 0.5 seconds from approximately 3 seconds of original voice data. It is a cutting-edge speech generation model that can generate speech in real time, and is expected to be used not only for simple speech generation but also for text reading, dialogue systems, etc. The model has the following characteristics:
– Realistic audio can be generated from approximately 3 seconds of audio data. – Approximately 30 seconds of audio can be generated in just 0.5 seconds ・From a few seconds of dialogue audio data, it is possible to generate a realistic dialogue that faithfully imitates the intonation and characteristics of the voices of the speakers.
▼Details
https://google-research.github.io/seanet/soundstorm/examples/
* ■About models compatible with Japanese*
SoundStorm is currently being developed using English as the base language and does not support voice generation in Japanese. developed a model. *Dialogue generation is not supported. Speech generates the content of speaker B’s utterance from speaker A’s voice in just 0.5 seconds of processing, based on several seconds of speaker A’s voice data and speaker B’s voice data that includes the content to be uttered.
to
Speech audio generation is possible. By utilizing this technology, we can expect it to be used in a wide range of fields in the future, including medicine, entertainment, media, and customer support. -Assumed usage scene of Japanese speech generation model-
・Support for people with speech difficulties
For those who require support for speaking, using your own or arbitrary voice data, outputting what you want to say in a corrected voice will help eliminate barriers to speaking.
・Reducing the mental burden of customer support
By outputting a less emotional voice than an emotional voice, the mental burden on the receiving side is reduced.
・Utilization in the entertainment field
In distribution activities such as media and SNS, the ability to output any audio in real time will reduce the cost of content production and expand the range of creative activities.
* 1. Maintain the performance of SoundStorm *
We are developing it based on the structure of the Conformer model built inside SoundStorm (a model equipped with technology developed by Google that can simultaneously capture the overall context and local context of text). This allows us to create a Japanese-compatible model that maintains the quality and speed of voice generation.
2. Audio quality and similarity of generated audio exceeds SoundStorm Regarding audio codecs that affect the quality of output audio, we developed a model using an audio codec that is suitable for generating Japanese audio. As a result, our developed model slightly outperformed the audio quality (unnaturalness, noise, etc.) and similarity score of the generated audio output by SoundStorm.
3. Speech generation model specialized in Japanese
The newly developed model is a Japanese-specific speech generation model that is trained using only a Japanese speech dataset that we have processed independently. For the dataset, we processed the data obtained from the Japanese speech corpus to remove background noise and sounds so that the data contained only human voices, resulting in the generation of higher quality Japanese speech. .
* ■Future outlook*
Speech generation technology is a technology that is expected to be used in various fields. In addition to developing a voice generation model compatible with Japanese, we will continue to develop technology to further increase the use of voice generation technology in Japan, including voice conversion, text reading, and real-time dialogue translation. We will proceed. We will also continue to work on preventing abuse of these generation technologies and developing detection technologies.
* ■Inquiry * For inquiries regarding NABLAS audio generation, please feel free to contact us using the form below.
https://www.nablas.com/contact
* ■About NABLAS Co., Ltd.*
Our company is a venture from the University of Tokyo, an AI human resources education and development institution, and a company that uses cutting-edge AI technology, especially deep technology. We are an AI research institute that provides solutions using learning technology. In our AI human resource development project, we use AI human resource education content developed at the University of Tokyo and content updated by our company to create a learning environment called iLect.
Along with the System, we provide it as an AI human resource development service. In the AI consulting/R&D business, we provide technical consulting services regarding the introduction, research, and development of AI technology, and provide technical services such as the introduction and development of AI technology depending on the client’s situation. As social uncertainty increases and the future becomes increasingly difficult to predict, the company’s mission is to “Discover
We are further committed to “the gradients, towards the future.”
* ■Company profile*
Company name: NABLAS Co., Ltd.
Representative: Representative Director and Director Kotaro Nakayama Head Office: Hongo Tsuna Building 1F, 6-17-9 Hongo, Bunkyo-ku, Tokyo Established: March 2017
Business content: AI human resource development
business/consulting/research and development
URL: https://nablas.com
Contact: pr@nablas.com (Public Relations Desk)