Rakuten Group, Inc. has unveiled two new AI models: Rakuten AI 2.0, the company’s first Japanese large language model (LLM) based on a Mixture of Experts (MoE)*1 architecture, and Rakuten AI 2.0 mini, the company’s first small language model (SLM). Both models will be released to the open-source community to empower companies and professionals developing AI applications by Spring 2025.
Rakuten AI 2.0 is an 8x7B MoE foundation model*2 based on the Rakuten AI 7B model released in March 2024. This MoE model is comprised of eight 7 billion parameter models, each as a separate expert. Each individual token is sent to the two most relevant experts, as decided by the router. The experts and router are continually trained together with vast amounts of high-quality Japanese and English language data.
Rakuten AI 2.0 mini is a 1.5 billion parameter foundation model and the first SLM developed by the company. The model was trained from the beginning on extensive Japanese and English language datasets curated and cleaned through an in-house multi-stage data filtering and annotation process, ensuring high-performance and accuracy in text generation tasks.
"At Rakuten, we see AI as a catalyst to augment human creativity and drive greater efficiency. Earlier this year, we launched a 7B Japanese LLM to accelerate AI solutions for local research and development," commented Ting Cai, Chief AI & Data Officer of Rakuten Group. "Our new cutting-edge Japanese LLM and pioneering SLM set new standards in efficiency, thanks to high-quality Japanese language data and innovative algorithms and engineering. These breakthroughs mark a significant step in our mission to empower Japan’s businesses and professionals to create AI applications that truly benefit users."
High efficiency with advanced architecture Rakuten AI 2.0 employs a sophisticated Mixture of Experts architecture, which dynamically selects the most relevant experts for given input tokens, optimizing computational efficiency and performance. The model offers comparable performance to 8x larger dense models, while consuming approximately 4x less computation than dense models during inference*3.
Increased performance Rakuten has conducted model evaluations with the LM Evaluation Harness*4 for Japanese and English capability measurements. The leaderboard evaluates language models based on a wide range of Natural Language Processing and Understanding tasks that reflect the characteristics of the target language. Rakuten AI 2.0’s average Japanese performance improved to 72.29 from 62.93 over eight tasks compared to Rakuten AI 7B, the open LLM Rakuten released in March 2024.
SLM: Compact and efficient for practical applications Rakuten AI 2.0 mini is compact enough to be deployed on mobile devices and used on-premises without the need to send data to remote servers. Compared with larger models used for generic applications, SLMs can be used for specific applications that need to be optimized for privacy, low latency or cost efficiency.
"We are thrilled to unveil the latest stars of the Rakuten LLM family. By leveraging the MoE architecture, our team has made a major breakthrough in boosting performance at much lower cost compared to traditional models. This MoE model sets a new benchmark for Japanese language models, delivering unmatched versatility and efficiency,” said Lee Xiong, General Manager of AI Engineering at Rakuten Group. “Meanwhile, Rakuten AI 2.0 mini is a powerhouse in a compact form, poised to revolutionize edge-based SLMs. Our dedicated team has poured their hearts into these innovations, and we can't wait to elevate Japanese businesses’ experience with the power of AI."
In March, Rakuten released the Rakuten AI 7B 7 billion parameter Japanese language foundation and instruct models. Training of the model was carried out on an in-house multi-node GPU cluster engineered by Rakuten to enable the rapid and scalable training of models on large, complex datasets.
Rakuten is continuously pushing the boundaries of innovation to develop best-in-class LLMs for R&D and deliver best-in-class AI services to its customers. By developing models in-house, Rakuten can build up its knowledge and expertise and create models optimized to support the Rakuten Ecosystem. By making the models open, Rakuten aims to contribute to the open-source community and accelerate the development of local AI applications and Japanese language LLMs.
As new breakthroughs in AI trigger transformations across industries, Rakuten’s AI-nization initiative aims to implement AI in every aspect of its business to drive further growth. Rakuten is committed to making AI a force for good that augments humanity, drives productivity and fosters prosperity.
*1 The Mixture of Experts model architecture is an AI model architecture where the model is divided into multiple sub models, known as experts. During inference and training, only a subset of the experts is activated and used to process the input.
*2 Foundation models are models that have been pre-trained on vast amounts of data and can then be fine-tuned for specific tasks or applications.
*3 Calculation based on active experts and total experts ratio in MoE LLM architecture: https://arxiv.org/abs/1701.06538
*4 Results of evaluation tests carried out on LM Evaluation Harness during October - December 2024. Using default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657
The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md
The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive
https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md
Email Newsletters
Sign up to receive TelecomTV's top news and videos, plus exclusive subscriber-only content direct to your inbox.