Fujitsu Limited Releases large-scale language model “Fugaku-LLM” trained on supercomputer “Fugaku”

(As this is a joint announcement by seven parties, there is a possibility that the information may be duplicated. Thank you for your understanding.)
Fujitsu Limited
Release of large-scale language model “Fugaku-LLM” trained on supercomputer “Fugaku”
Excellent Japanese language ability, expected for use in research and business ……
Main points
・Publishing a large-scale language model with excellent Japanese language ability developed using Japanese computer technology ・Achieving distributed parallel learning that takes full advantage of the performance of the supercomputer “Fugaku”
– Leading to innovative research and business such as “AI for Science” which utilizes AI-based models for scientific research
【overview】
A research team led by Professor Rio Yokota of Tokyo Institute of Technology’s Academic International Information Center, Associate Professor Keisuke Sakaguchi of Tohoku University’s Graduate School of Information Science, Koichi Shirahata, Senior Project Director of Fujitsu Limited’s Artificial Intelligence Laboratory, and Mohamed Wahib of RIKEN.・Wahib) team leader, Associate Professor Hiroshi Nishiguchi of Nagoya University Graduate School of Engineering, Shota Sasaki, research scientist at AI Lab, AI Business Headquarters, CyberAgent Inc., and Hiroyuki Kojima, CEO of Kotoba Technologies Inc. On May 10, 2024, we released “Fugaku-LLM,” a large-scale language model (Note 1) with excellent Japanese language ability that was trained using the computer “Fugaku.”
This time, by porting a deep learning framework to Fugaku and optimizing the performance of Transformer (Note 2) on Fugaku, we also developed and applied a parallel distributed learning method to improve performance using Fugaku. The calculation speed when training large-scale language models has been increased by 6 times (compared to Reference 1). Furthermore, by optimizing collective communication on Tofu Interconnect D (Note 3) for Fugaku, we succeeded in increasing communication speed by three times (compared to Reference 2). This has made it possible to train large-scale language models within a realistic amount of time using Fugaku’s CPU.
Fugaku-LLM is a 13-billion-parameter model that generally has higher performance than the 7-billion-parameter models (Note 4) that are often developed in Japan, and is easier to handle in today’s computer environments. This training uses unique Japanese learning data and English data collected by CyberAgent, ensuring transparency and safety while also providing excellent Japanese language performance. Among the open models that are trained using domestically produced and proprietary data, it achieved the highest performance in the Japanese benchmark Japanese MT-Bench (Note 5), demonstrating particularly high benchmark performance in tasks related to the humanities and social sciences.
Fugaku-LLM is publicly available through GitHub (Note 6) and Hugging Face (Note 7), and can be used for research and commercial purposes as long as the license is followed.
In the future, as many researchers and engineers improve basic models and participate in new applied research, even more efficient learning methods will be created, such as collaboration between scientific simulation and generative AI, and virtualization using thousands of AI. It is expected that this will lead to next-generation innovative research and business applications, such as community social simulation.
(Public link) Model: https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B Source code: https://github.com/Fugaku-LLM/DeepSpeedFugaku
【background】
In recent years, large-scale language models (LLM) have been actively developed mainly in the United States, and major changes are occurring in all aspects of research and development, economic society, and security. Countries other than the United States are also investing vast amounts of human and computational resources to develop LLMs in their own countries. In Japan, there are high expectations for Fugaku, the flagship system of Japan’s supercomputers, as a computational resource for AI research, and we are working to create an environment for large-scale distributed parallel computing on Fugaku. was required.
Therefore, Tokyo Institute of Technology, Tohoku University, Fujitsu, and RIKEN will begin joint research and development of large-scale language models in May 2023, and from August 2023, Nagoya University, CyberAgent, and Kotoba Technologies will participate. did.
[Role of each institution/company]
Tokyo Institute of Technology: General overview, parallelization of large-scale language models and communication acceleration
(optimization of communication performance by combining three types of parallelization, acceleration of collective communication on Tofu Interconnect D) Tohoku University: Learning Collection of data, selection of learning model Fujitsu: Acceleration of computation and communication (acceleration of collective communication on Tofu interconnect D, performance optimization of pipeline parallelism), pre-learning and post-learning fine tuning Physical and chemical research Place: Distributed parallelization and communication acceleration of large-scale language models (acceleration of collective communication on Tofu Interconnect D) Nagoya University: Study of application method of Fugaku-LLM to 3D shape generation AI CyberAgent: For learning Data provision Kotoba Technologies: Porting deep learning framework to “Fugaku”
[Image 1: https://prtimes.jp/i/93942/295/resize/d93942-295-ce51dd6805797a39af3b-0.jpg&s3=93942-295-fd2cd3f4c3f5dfb292cf8d583c79a71e-900×600.jpg] Figure 1 RIKEN supercomputer “Fugaku”
【reaserch result】
1. Significantly improved computational performance for large-scale language model training on Fugaku
In this research, by using Fugaku, we succeeded in increasing the calculation speed when learning large-scale language models by six times compared to existing technology, and the communication speed by three times. Regarding calculation speedup, in order to optimize Transformer performance on “Fugaku”, we ported the deep learning framework Megatron-DeepSpeed to “Fugaku” and accelerated the dense matrix multiplication library for Transformer. To increase
communication speed, we optimized communication performance for “Fugaku” by combining three types of parallelization and accelerated collective communication on Tofu Interconnect D.
Typically, GPUs (Note 8) are used for training large-scale language models, but there is a shortage of GPUs around the world for training large-scale language models, and it is difficult to obtain large quantities of the latest GPUs. It becomes. Under such circumstances, the fact that we were able to learn a large-scale language model using Fugaku, which uses a domestically produced CPU manufactured by Fujitsu as its central processing unit instead of a GPU, is a great example of the use of Japanese semiconductor technology and from the perspective of economic security. This is also an important result.
Additionally, the knowledge gained through this project can be utilized in the design of next-generation computing infrastructure after Fugaku, and will contribute to establishing Japan’s dominance in the AI field.
2. A large-scale language model with 13 billion parameters that guarantees transparency and security, is easy to use, and has excellent Japanese language performance.
In 2023, many large-scale language models have been developed by domestic companies, many of which are models with 7 billion
parameters. The performance of large-scale language models generally improves as the number of parameters increases, so the 13
billion-parameter Fugaku-LLM developed this time can be said to be a high-performance model. Even larger models have been developed overseas, but large-scale language models require large-scale computational resources to be used, so it is difficult to use models with too large a number of parameters. In comparison with the current computer environment in 2024, Fugaku-LLM has a high performance and well-balanced 13 billion parameters.
Additionally, many models that can handle Japanese use a continuous learning method (Note 9) in which Japanese data is further learned from open models developed overseas. In contrast, the newly developed Fugaku-LLM performs learning from scratch using unique data, so it is able to grasp the entire learning process, making it superior in terms of transparency and safety.
Approximately 400 billion tokens were trained using 13,824 computing nodes on Fugaku, and approximately 60% of the learning data was Japanese content, which was also combined with English, mathematics, and code. Rather than continuing to learn Japanese based on learning in another language, Fugaku-LLM, which has learned a lot of
information in Japanese from the beginning, has an average score of 5.5 on Japanese MT-Bench, and is based on domestically produced and unique data. It achieved the highest performance among the open models undergoing learning. In particular, it has demonstrated a high benchmark performance of 9.18 in humanities and social studies tasks, and it is expected that it will be able to conduct natural
conversations based on Japanese characteristics such as honorific language.
[Image 2: https://prtimes.jp/i/93942/295/resize/d93942-295-d2281f81a194d64cd711-1.png&s3=93942-295-1842c195f02068e9ad63bfd7942b5de4-980×308.png] Figure 2 Fugaku-LLM demo
【the next deployment】
The research results obtained from this initiative are being made public through GitHub and Hugging Face so that researchers and engineers can use them to develop large-scale language models. Anyone can use it for research and commercial purposes under the conditions specified in the license. Furthermore, Fujitsu will begin offering Fugaku-LLM from May 10, 2024 through the Fujitsu Research Portal, where users can try out Fujitsu’s cutting-edge technology for free. By using the published model, many researchers and engineers will improve the basic model and participate in new applied research, creating efficient learning methods and language models, and collaboration between scientific simulation and generative AI. AI for Science, which utilizes AI-based models in scientific research to automate the scientific research cycle, and social simulations of virtual communities using over thousands of AI, which will lead to next-generation innovative research and business results. There is expected.
[Additional note]
This result is a result of the “Fugaku” policy response project “Development of a large-scale language model distributed parallel learning method using Fugaku” (problem number: hp230254).
【Glossary】
Note 1
Large-scale language model: A model that models the ease with which text appears, and can predict the text (response) that will follow given a context (question).
Note 2
Transformer: A neural network for transforming word sequences, and is most commonly used in current large-scale language models. It is a deep learning model that appeared in the paper “Attention Is All You Need” published by Google in June 2017, and is mainly used in the field of natural language processing.
Note 3
Tofu Interconnect D: A high-speed network with a 6-dimensional torus topology used to connect nodes on Fugaku. Tofu is a shortened name from Torus fusion, and the symbol “D” means high-density.
Note 4
Parameter: One of the indicators that represents the scale of neural networks such as large-scale language models. The more parameters there are, the higher the performance of the model, but the more data is required for training.
Note 5
Japanese MT-Bench: Japanese benchmark test provided by Stability AI Note 6
GitHub: A platform used around the world to publish open source software. https://github.com/
Note 7
Hugging Face: A platform used around the world to publish open source software. https://huggingface.co/
Note 8
GPU: Originally produced as a calculation accelerator for drawing, in recent years it has been used to speed up deep learning.
Note 9
Continuous learning: A method of additionally learning a large-scale language model that has already been trained. It is used when utilizing language models in different languages and domains. [References]
1. COOL Chips 27 (April 17-19, 2024) “Implementation of Batch Matrix Multiplication for Large Language Model Training on A64FX CPUs” 2. 193rd HPC Research Conference (March 18-19, 2024) Presentation: “Acceleration of All-reduce Communication in Large-Scale Machine Learning on Fugaku”
[Inquiries regarding this matter]
Tokyo Institute of Technology
Professor Rio Yokota, Academic Information CenterE-mail:
rioyokota@gsic.titech.ac.jpTEL: 03-5734-2121FAX: 03-5734-3276 Tohoku University
Keisuke Sakaguchi, Associate Professor, Graduate School of Information Science and TechnologyE-mail: keisuke.sakaguchi@tohoku.ac.jpTEL: 022-795-7091
Fujitsu Limited
Fujitsu Contact Line (General Counter)
0120-933-200 (toll free)
Reception hours: 9:00 to 12:00 and 13:00 to 17:30 (excluding Saturdays, Sundays, holidays, and Fujitsu designated holidays) Contact form (https://contactline.jp.fujitsu.com/customform/csque04802/873532/) RIKEN
Kobe Office Computational Science Research Promotion Office E-mail: r-ccs-koho@ml.riken.jp
Nagoya University
Associate Professor Koji Nishiguchi, Graduate School of
EngineeringE-mail: koji.nishiguchi@gmail.comTEL: 052-789-2736 Kotoba Technologies, Inc.
Representative Director Hiroyuki Kojima E-mail: nkojima@kotoba.tech Product prices, specifications, service content, etc. stated in the press release are as of the date of announcement. It is subject to change without prior notice. Please note.
More details about this release:
https://prtimes.jp/main/html/rd/p/000000295.000093942.html