Recent advancements in AI have been propelled by large language models (LLMs) containing billions to trillions of parameters. Parameters—variables used to train and fine-tune machine learning models—have played a key role in the development of generative AI. As the number of parameters grows, models like ChatGPT can generate human-like content that was unimaginable just a few years ago. Parameters are sometimes referred to as “features” or “feature counts.”
While it’s tempting to equate the power of AI models with their parameter count, similar to how we think of horsepower in cars, more parameters aren’t always better. An increase in parameters can lead to additional computational overhead and even problems like overfitting.
There are various ways to increase the number of parameters in AI models, but not all approaches yield the same improvements. For example, Google’s Switch Transformers scaled to trillions of parameters, but some of their smaller models outperformed them in certain use cases. Thus, other metrics should be considered when evaluating AI models.
The exact relationship between parameter count and intelligence is still debated. John Blankenbaker, principal data scientist at SSA & Company, notes that larger models tend to replicate their training data more accurately, but the belief that more parameters inherently lead to greater intelligence is often wishful thinking. He points out that while these models may sound knowledgeable, they don’t actually possess true understanding.
One challenge is the misunderstanding of what a parameter is. It’s not a word, feature, or unit of data but rather a component within the model’s computation. Each parameter adjusts how the model processes inputs, much like turning a knob in a complex machine. In contrast to parameters in simpler models like linear regression, which have a clear interpretation, parameters in LLMs are opaque and offer no insight on their own.
Christine Livingston, managing director at Protiviti, explains that parameters act as weights that allow flexibility in the model. However, more parameters can lead to overfitting, where the model performs well on training data but struggles with new information.
Adnan Masood, chief AI architect at UST, highlights that parameters influence precision, accuracy, and data management needs. However, due to the size of LLMs, it’s impractical to focus on individual parameters. Instead, developers assess models based on their intended purpose, performance metrics, and ethical considerations. Understanding the data sources and pre-processing steps becomes critical in evaluating the model’s transparency.
It’s important to differentiate between parameters, tokens, and words. A parameter is not a word; rather, it’s a value learned during training. Tokens are fragments of words, and LLMs are trained on these tokens, which are transformed into embeddings used by the model.
The number of parameters influences a model’s complexity and capacity to learn. More parameters often lead to better performance, but they also increase computational demands. Larger models can be harder to train and operate, leading to slower response times and higher costs. In some cases, smaller models are preferred for domain-specific tasks because they generalize better and are easier to fine-tune.
Transformer-based models like GPT-4 dwarf previous generations in parameter count. However, for edge-based applications where resources are limited, smaller models are preferred as they are more adaptable and efficient.
Fine-tuning large models for specific domains remains a challenge, often requiring extensive oversight to avoid problems like overfitting. There is also growing recognition that parameter count alone is not the best way to measure a model’s performance. Alternatives like Stanford’s HELM and benchmarks such as GLUE and SuperGLUE assess models across multiple factors, including fairness, efficiency, and bias.
Three trends are shaping how we think about parameters. First, AI developers are improving model performance without necessarily increasing parameters. A study of 231 models between 2012 and 2023 found that the computational power required for LLMs has halved every eight months, outpacing Moore’s Law. Second, new neural network approaches like Kolmogorov-Arnold Networks (KANs) show promise, achieving comparable results to traditional models with far fewer parameters. Lastly, agentic AI frameworks like Salesforce’s Agentforce offer a new architecture where domain-specific AI agents can outperform larger general-purpose models.
As AI continues to evolve, it’s clear that while parameter count is an important consideration, it’s just one of many factors in evaluating a model’s overall capabilities.
To stay on the cutting edge of artificial intelligence, contact Tectonic today.