Why It’s Time to Rethink “Large” and “Small” for Language Models

Intial Post from Linkedin: 

https://www.linkedin.com/posts/lukenorris_why-its-time-to-rethink-large-and-small-activity-7260312867311124481-lWh-?utm_source=share&utm_medium=member_desktop

The terms “large” and “small” are quickly losing relevance in the world of language models. Not long ago, models like LLaMA 70B were considered "large." Now, they fall somewhere between “medium” and “small” in comparison to models like Qwen 72B or the recent release of LLaMA 405B, and soon we’ll see trillion-parameter so called giants on the horizon. For enterprise contexts, these distinctions become even more misleading, especially as memory and processing capabilities continue to advance at unprecedented speeds.

Here’s the real shift: Moore's Law, now supercharged by accelerated compute, is pushing memory and performance capabilities forward at an exponential rate. This isn’t just incremental growth; it’s a leap in available infrastructure to run these models seamlessly. A quantized version of a recent 405B-parameter model, for example, can now run efficiently on a single modern 8-way server—a far cry from the thousands of servers traditionally associated with these “large” models during training. For around $300-500k, an enterprise can deploy this powerhouse to manage inferencing workloads. Think about the impact: it’s the equivalent of having 100+ PhDs working 24/7 for three to five years, with instant productivity gains.

This also changes the narrative around model interactivity and performance requirements. Many enterprise applications don’t need real-time, interactive models. Agentic, non-interactive services can operate in the background, managing workflows, performing data processing, and handling knowledge-intensive tasks at human timescales or even slower. For these scenarios, the power and scope of a larger model—even if it’s a bit slower—are much more advantageous than smaller, faster alternatives that may lack depth.

As memory and compute capabilities continue to evolve, the line between what’s “large” and “small” will only become blurrier. The real value lies in choosing models that can support complex, enterprise-level workflows, not in sticking to outdated labels based on size alone. This shift from “large” and “small” to “what drives the most impact” is the future of AI in the enterprise. The capacity to revolutionize workflows, augment human knowledge, and sustain continuous productivity is what truly matters now, transcending any notion of size.

Views: 0