Will 2025 be the breakthrough year for SLMs?

  • Large language models (LLMs) have been the foundation of generative AI’s initial success
  • Now the GenAI sector is starting to mature
  • Small language models (SLMs) are set to make an impact on the AI world in 2025, according to the brainboxes at the MIT Technology Review 

For the past five years, what is popularly described as generative AI (GenAI) has driven the worldwide uptake and scale of large language models (LLMs). In essence, these are snapshots of various sets and subsets of knowledge (be it publicly available or private and copyrighted) that have been trawled and scraped from anywhere and everywhere: It’s all grist to the mill for the AI companies. 

However, LLMs are not part of the internet and they cannot learn, but they do repeatedly train and in so doing ingest new data along with the old cud that has been chewed on previously. Unsurprisingly, their outputs degrade and become increasingly inaccurate as they do so – LLMs learn to mimic patterns of human communication but have no sense of context and react to each task presented as a new, isolated, one-off interaction. They don’t exploit new data in real time, nor do they automatically retain information from previous interactions – they simply predict from their pre-trained knowledge banks. That’s why developers are working on ways to simulate long-term learning in LLMs and also why agentic AI is set to become such a big deal

Much effort is also being expended on small language models (SLMs). They have been around for a while – many were launched during 2024 – but have not yet had a major impact on the AI sector, but that seems likely to change during the course of this year, especially as agentic AI takes hold in the enterprise sector (as SLMs can act as the foundation for AI agents). 

The latest edition of the MIT Technology Review features 10 breakthrough technologies for 2025, one of which is SLMs which, according to the report, are more efficient than LLMs, as they are trained on smaller and more focused datasets and have particular potential utility in vertical, industry-specific applications for sectors such as telecom. 

What’s more, SLMs don’t consume vast amounts of energy and can perform at least as well as, or often better than, LLMs. That’s why the likes of OpenAI, Google DeepMind, Microsoft and Anthropic, along with a plethora of startups, are developing SLMs. 

As a recent blog published by IBM states, SLM parameters (internal variables, such as weights and biases, that a model learns during training) range from a few million to a few billion, while LLMs have hundreds of billions or even trillions of parameters. These parameters influence how a machine-learning model behaves and performs. SLMs don’t sprawl like LLMs do. They require less memory and computing power and are touted as ideal solutions for “resource-constrained environments, such as edge devices and mobile apps, or even for scenarios where AI inferencing – when a model generates a response to a user’s query – must be done offline without a data network,” according to IBM. 

Of course, SLMs are based on the development of LLMs and use the same neural network-based architecture called the ‘transformer model’, and compression techniques are applied to the model to shape an SLM that is smaller but still very accurate. Such a smaller, focused, AI model that reiterates a specific series of expected requests does not need to be able to draw on the whole of the contents of the internet and that is set to be of great benefit to businesses. 

Many big technology companies are now working on the provision of SLMs based on their LLM solutions. For example, OpenAI offers the GPT-4o mini while Google DeepMind has Gemini Nano. Anthropic’s Claude 3 comes in three sizes – Opus, the midsize Sonnet, and the tiny Haiku – and Microsoft is deeply committed to its own SLM family, Phi-3. 

Meanwhile, AI startup Writer, which describes itself as “a full-stack generative AI platform”, claims that its latest language model is a lean machine as powerful as the largest, flabbiest and best-known GenAI models while being constructed on just 5% of the number of their parameters. 

As to the future, according to IBM, one possibility will be the emergence of a  hybrid AI model with SLM models running on premises and accessing LLMs hosted in the public cloud when access to internet-wide data is required. Also, intelligent routing will make the distribution of AI workloads considerably more efficient via routing modules that can “accept queries, evaluate them and choose the most appropriate model to direct queries to. Small language models can handle basic requests, while large language models can tackle more complicated ones.” 

Add all of this together, and you can understand why the folks at MIT are saying, in reference to SLMs: “Move over dinosaurs. The future belongs to smaller, nimbler beasts.”

Martyn Warwick, Editor in Chief, TelecomTV

Email Newsletters

Sign up to receive TelecomTV's top news and videos, plus exclusive subscriber-only content direct to your inbox.