There is no doubt that one center of the AI universe (at least in the western world) is the NVIDIA ecosystem. The combination of graphical processing units (GPUs) and machine learning and other “intelligent” inference models has led us to the rapidly changing world we know today.
I spent three days at the GTC conference in San Jose, California, this week, and I was blown away by the excitement demonstrated by both the presenters and the attendees. I thought I’d share some of my observations about why, and what the NVIDIA community brings to the table.
The first isn’t so much a lesson as it is the affirmation of a common pattern that is essential to technology innovation. Whenever we see a major paradigm shift—client server, the Internet, cloud computing, etc.—bright minds see opportunity. The AI revolution is no different in this regard, and yet will have a greater impact than almost anything that came before it.
We’ll talk about many of the reasons in a second, but what NVIDIA GTC demonstrated in one place is the insane blast area that AI is having on technology markets.
Walking around the show floor was an eye opener. Application platforms (can you really call them “developer platforms” anymore), operations platforms, AI training, models, inference engines, hardware, and more were represented by companies ranging from the usual titans of the industry to tiny one or two person startups.
NVIDIA itself announced new hardware targeting what they see as changes in the way enterprises will work every day (and services like cloud computing will support that change). Scale was everywhere, with some companies focusing on computational density while others focused on extensibility and flexibility. All (including NVIDIA) spent significant time touting solutions to the problems that come with that scale—namely power and cooling.
But software was well represented, as well. I saw the first genuine attempts at “do everything” platforms (which I think are unlikely to dominate the market), which handle prompts, source code control, deployment, operations, monitoring, and so on. I also saw some very impressive targeted solutions for vertical industries, research categories, and common corporate functions. Of course, Kamiwaza was there alongside our partners at HPE. Our smart cities story, combined with the increasing need to efficiently and securely turn enterprise data into just the context needed, was playing well to attendees on the expo floor.
One thing about NVIDIA GTC is that, despite having many software development focused companies present, it is still primarily an infrastructure conference. But, contrary to—say—the peak of the cloud computing hype cycle, the infrastructure story is far from a slight shift in the operational parameters to suit a new user experience.
AI infrastructure is driving a number of absolute revolutions in infrastructure design. New CPUs (including one from NVIDIA), new form factors, new cooling infrastructure, and more are requiring data centers to potentially rethink entire subsystems. Smart routing of inference (including Kamiwaza’s Inference Mesh), new data storage concepts, new memory concepts, and more, are changing the way software infrastructure plays a role in hardware usage. And, of course, agents are changing the way software gets things done for individuals and businesses alike.
The growth of agentic computing is having a big impact on where infrastructure dollars are being spent. The early excitement around GPUs was centered on training AI models. (And BitCoin mining, but I digress.) This year, NVIDIA CEO Jensen Huang spent significant time talking about the shift of cycles from training to inference. Tokens are the new unit of compute productivity, according to Huang, and I tend to agree.
In my experience, when a major player dedicates an entire keynote to a shift like this, it means that the shift is so far along, it's basically a done deal. I see evidence of this everywhere. While I write this, Kamiwaza engineers are having an energetic discussion about personal hardware, and the need to buy more “tokens-per-second.” I spent several minutes talking to a pharmaceutical executive about the work they are doing to predict token consumption in the next several years, and the struggle they are having equating that to business outcomes.
At this point, I would argue that every enterprise is probably focusing AI hardware budgets on buying “tokens-per-second” for inference. The power of models is lost if their intelligence cannot be used at scale.
The last thing I’ll point out is pretty basic, but I will argue it supports a theory I shared in an earlier post: despite the presence of AI application platform vendors at the show, I ran into few attendees that were focused on creating applications. Most were concerned about the best way to build a base infrastructure for scaling AI usage over the next several years. I ran into very few that were actually creating applications with AI (except, of course, for vendors themselves).
This is a big deal, because it continues to support the idea that the IT market divides between “capacity delivery” and “capacity consumption.”. In the cloud era, capacity was defined as compute, network, storage, and an increasing number of services to support shared needs like databases and messaging systems. In the AI era, “capacity” refers to intelligence. To inference. To the ability to create value from inference.
While I was amazed at what I saw and learned at GTC, I'll also be looking for the places where practitioners building end user outcomes are congregating. The major cloud conferences, open source gatherings, conferences dedicated to new platform approaches are all on my radar.
I would love to hear which are your favorites in the comments below. Maybe I’ll even see you there.
I write to learn, so feel free to comment below with questions—or challenges. I look forward to hearing from you.