HomeSystemsGPU/AcceleratorsAI Compute Has Options: CPUs, GPUs, and Custom Silicon

AI Compute Has Options: CPUs, GPUs, and Custom Silicon

Why can’t Big Tech just build their own chips and end NVIDIA’s dominance? It’s a fair question I hear a lot. 

Companies like Microsoft, Google, Amazon – even Apple – pour billions into R&D. In fact, they are building custom chips: Microsoft has its “Maia” AI accelerator for Azure, Google has been iterating its TPUs (Tensor Processing Units) for years, Amazon offers Trainium and Inferentia chips in AWS, and Apple’s M-series chips have transformed its Macs and iPhones. 

Yet NVIDIA (along with partners like TSMC and ASML) still leads in cutting-edge AI computing. Why? Because designing a chip is only half the battle – manufacturing at scale (TSMC’s specialty) and pushing the performance envelope year after year (NVIDIA’s specialty) are incredibly hard. 

Big Tech isn’t really trying to kill NVIDIA; they’re building in-house chips mainly to lower their own costs and optimize performance for their specific platforms. For raw AI horsepower that anyone can use, NVIDIA (and to an extent, AMD) continue to set the pace.

Broadcom’s $10B order signals a multi-silicon era

OpenAI is reportedly collaborating with Broadcom to produce in-house AI chips by 2026, signaling a shift away from total reliance on Nvidia GPUs. But something shifted recently. For nearly a decade, the playbook for AI was predictable: if you needed to train a state-of-the-art model or serve up AI at scale, you bought NVIDIA GPUs, paid the steep “Jensen premium” (Nvidia’s CEO Jensen Huang isn’t shy about those margins), and built your software around CUDA. 

GPUs were essentially the only game in town, and everyone from research labs to startups made do with whatever cards they could get. Now, a surprise $10 billion order for custom AI chips from Broadcom – widely believed to be OpenAI – has cracked the aura of invincibility around NVIDIA. 

Broadcom’s CEO Hock Tan confirmed a massive new customer order for AI processors shipping in 2026 (he didn’t name OpenAI, but everyone suspects it) and even noted that these bespoke chips are meant to be an alternative to expensive off-the-shelf GPUs. Broadcom’s roster of AI silicon clients already included Google, Meta, and ByteDance, and now it looks like OpenAI is joining that list. 

In response, NVIDIA itself is adapting – despite Jensen once dismissing custom ASICs as “side-street solutions” that would mostly get canceled if they couldn’t outperform his GPUs, the company is reportedly hiring a custom-chip team in Taiwan and rolling out technologies like NVLink Fusion to integrate its GPUs with other specialized chips.

So, are we headed for a diversified AI chip market ahead, or is this just another cycle where GPUs adapt and retain the crown? Let’s break down where things stand, and how CPUs, GPUs, and new custom silicon each fit into the picture.

The Era of Choice in AI Hardware

GPUs earned their dominance by being versatile. A new model architecture, a novel training trick, a different neural network altogether; chances are the same GPU can run it with just a software update. This adaptability made GPUs the default for AI research and development. 

NVIDIA cultivated a whole ecosystem (CUDA, libraries, tools) that allowed AI folks to iterate quickly. That flexibility still matters a lot. But once an AI workload evolves from wild research into a product with real users and service level commitments, we’ve learned that specialization can pay off. In other words, after the exploratory phase, it starts to make sense to ask: Can we do this more efficiently with something other than a “general-purpose” GPU?

We’re entering an era of choice now, where there isn’t a one-size-fits-all processor for AI. The largest tech platforms illustrate this with their own stack designs. 

  • Microsoft’s Maia chip, for example, is tuned for Azure’s needs – it doesn’t have to beat NVIDIA’s flagship GPU on every benchmark, as long as it improves cost and performance for the specific Azure workloads Microsoft cares about. 
  • Google’s TPUs power many of Google’s own services and are tightly woven into Google Cloud; they excel at targeted tasks (like massive matrix multiplications for training Transformers) and deliver great efficiency for those jobs.
  • AWS uses its Trainium and Inferentia chips to give customers cheaper options for training and inference in the cloud, integrating them into services like SageMaker so it’s easy to adopt without rewriting your whole codebase. 

These in-house custom chips show how bespoke silicon can drive down costs at scale. The recent Broadcom-OpenAI news is perhaps the clearest signal yet: if you have enough scale (think ChatGPT-level demand), designing a custom accelerator for your workload starts to look viable, even inevitable. It’s a move to gain more control – over cost, over performance per watt, and over supply chain independence (no more waiting in line for Nvidia’s next shipment).

None of this means GPUs are suddenly obsolete. NVIDIA isn’t sitting still either. NVLink Fusion will let hyperscalers connect custom ASICs or other accelerators directly with NVIDIA GPUs in a shared memory pool – effectively enabling hybrid GPU/ASIC systems without bottlenecks. And the rumor mill says NVIDIA is staffing up a team to design its own AI ASICs (Application-Specific Integrated Circuits) in case there’s a lucrative corner of the market better served by something more fixed-function than a GPU. 

The direction across the industry is clear: more customization and more diversity in hardware, aimed at squeezing out better performance per dollar for specific use cases. Again, we’ve left the era where one chip architecture (GPU) ruled everything, and we’re now in an era of choice.

Where Each Option Fits

Not every AI workload is the same, so each type of chip has its sweet spot. Here’s a simple decision lens for CPUs vs GPUs vs custom ASICs in AI. As usual, these are my opinions, based on my research, but also part of my current experience in the AI/GPU world.

CPUs – use them for simplicity and versatility. 

The trusty CPU never left the stage. Modern server CPUs (like the latest Intel Xeons and AMD EPYCs) have added AI-friendly instructions (e.g. Intel’s AMX and AMD’s advanced AVX-512) that give them a surprising boost on certain AI tasks. 

Note: I had the chance to work and play around with the latest Intel Xeon 6, AMX and AVX-512, though these are cool technologies, I was not really impressed. Important to mention, I couldn't do some tests as I wanted and Intel’s support was little.

CPUs are great for smaller-scale models (think moderate-sized language models, classical ML algorithms, or vision models that don’t require massive parallelism) and for all the glue code around your AI pipelines. They handle things like data preprocessing, feature extraction, and the orchestration between heavy-duty GPU/ASIC tasks. 

In deployments at the edge or in highly regulated environments, CPUs can be the safest (or only) choice, because introducing new specialized hardware might not be feasible. Plus, if you want one homogenous fleet of machines to keep operations simple, CPUs give decent AI performance today without any special accelerators. 

GPUs – use them for speed of development and flexibility. 

GPUs remain the default choice when your workload is evolving rapidly or when you simply need proven, scalable performance now. If you’re experimenting with cutting-edge research – new attention mechanisms, mixture-of-experts models, reinforcement learning agents, you name it, the GPU’s software ecosystem will save you time. 

NVIDIA’s stack (CUDA, TensorRT, PyTorch integrations, etc.) and similar support for AMD GPUs mean you can go from idea to working code quickly, leveraging tons of existing optimizations. For training the absolute largest models or doing complex multi-GPU distributed training, GPUs have the robust infrastructure (networking, memory coherence via NVLink, etc.) to make it feasible. And for serving AI models, GPUs coupled with software like TensorRT and Triton Inference Server let you squeeze out latency and throughput improvements without reinventing the wheel. 

In short, if your AI roadmap changes every week and you need maximum agility, GPUs are your friend. They’ll carry you from prototype to production reliably, albeit at a higher cost. (Your finance team might wince at the electricity and purchase price, but you’ll at least meet your launch deadlines.)

Custom Silicon (ASICs) – use them for massive, steady workloads. 

When you have a predictable, heavy workload at scale, that’s when designing or adopting a custom chip pays off. The classic example is a mature AI service with huge volume – say, a popular model serving millions of requests, where you know exactly the model architecture and can forecast the queries per second. In these cases, a specialized accelerator can deliver way better efficiency than a general-purpose GPU. 

By tailoring the chip’s design to your model (right-sizing the compute units, on-chip memory, interconnects, etc.), you can get more inferences per watt and cut your cost per query. Again, we see this with Google’s TPUs handling translation and search queries, or with Amazon’s Inferentia chips serving Alexa models. If you can lock down a latency target and you’re pushing enough load, an ASIC can be tuned to meet that exactly, without paying for extra flexibility you don’t need. 

The trade-off, of course, is software support and development time. Building your own chip (or even heavily customizing one) is a multi-year effort and requires software compilers, drivers, and monitoring tools that are as robust as what the GPU world offers.

 But even most organizations (and 100% sure for consumers / AI enthusiasts) won’t go that far – instead, they’ll consume custom silicon via cloud providers. Only the true hyperscalers like Google, Amazon, Meta, and now OpenAI will fund their own in-house silicon design. But those that do can gain an edge in density and cost at scale, and reduce their dependency on a single vendor’s roadmap. As OpenAI’s move shows, owning your compute can equate to owning your destiny (or at least cutting down those jaw-dropping GPU bills).


Gaudi = the middle ground

If GPUs are the default for flexibility and custom ASICs shine at hyperscale, Intel’s Gaudi accelerators land in the middle as a strong alternative for cost-efficient training and inference. Gaudi was built from the ground up for deep learning, with an emphasis on Ethernet-based scaling and AI-optimized networking. Unlike GPUs, which usually rely on proprietary interconnects like NVLink or InfiniBand, Gaudi leans on standard Ethernet to connect nodes. That makes it attractive for enterprises that want to build large AI clusters without getting locked into one vendor’s fabric.

More about Intel Gaudi.

One catch: 

All these shiny options still depend on the same supply chain bottlenecks at the cutting edge of chip manufacturing. Whether it’s a GPU or a custom ASIC, if it’s built on a 3nm or 5nm process, it’s coming out of TSMC’s fabs – and TSMC can only make so many each quarter. 

Packaging technologies like CoWoS (chip-on-wafer-on-substrate, crucial for stitching high-bandwidth memory onto AI chips) are in limited supply, which is one reason GPUs have been hard to get. High Bandwidth Memory itself (those stacked memory chips) have lead times and yield challenges. And the entire industry from NVIDIA to Broadcom relies on the machines made by ASML – the single company that builds the EUV lithography equipment needed for today’s most advanced chips. 

In short, diversification in design is happening, but everyone is still drinking from the same well when it comes to actually fabbing and assembling these chips. Any plan for AI hardware in the next few years has to account for those constraints. As Hock Tan projected, the AI chip market could be $60–90 billion by 2027, but reaching that number will depend as much on capacity and supply chain as on demand.

The Coming Phase of AI Compute

What does all this mean for the near future? We’ll likely see a barbell strategy in AI infrastructure. On one end of the spectrum, there will be GPU-heavy clusters pushing the frontiers of AI – think massive model training runs, complex AI agents that evolve week to week, and enterprises deploying new AI features that they’re still fine-tuning. These will continue to buy the latest GPUs (NVIDIA’s Blackwell generation is here, and AMD is in the game too) because they need the versatility and proven performance. 

On the other end of the spectrum, we’ll see ASIC-heavy fleets quietly chewing through massive stable workloads – for example, a social network doing billions of content recommendations per day with a fixed model, or a cloud provider offering an AI inference service backed by custom chips. Those deployments will favor custom silicon once it’s available, because efficiency and cost at scale trump flexibility.

In the middle of it all, the humble CPU will keep things glued together. Servers will use CPUs to handle all the auxiliary tasks (data loading, preprocessing, business logic around the AI results) and even run smaller models independently. And for many companies that aren’t operating at mega-scale, a well-tuned CPU-only solution might be the simplest approach for smaller AI models or edge AI deployments.

We might also see hybrid racks become common, where one server contains a mix of GPUs and specialized accelerators operating side by side. NVIDIA’s NVLink Fusion that I mentioned is explicitly about enabling this – allowing, say, a Broadcom-designed chip to live alongside NVIDIA GPUs, sharing memory and workloads, rather than having completely separate islands for each. Even NVIDIA’s leadership has hinted that they envision more semi-custom solutions in big data centers, which is a significant shift from the “just use more GPUs” stance of the past.

The net effect is that the AI compute landscape will be more diverse than ever. The market is growing so fast and the demand is so insatiable that there’s plenty of room for GPUs, ASICs, and CPUs to each claim a part of the load. NVIDIA’s not going anywhere – in fact, I am almost sure they will continue to dominate the bleeding-edge and the general-purpose segments, and their full-stack approach (chips, boards, systems, networking, software) is a strong moat. But the era of one-size-fits-all is over. 

Wrapping Up: Practicality Wins Over Hype

The next time you or your organization plan an AI deployment, you might actually have a choice: do we stick with GPUs for speed, try a cloud TPU/Trainium instance for lower cost, or just run it on a beefy CPU server because it’s small enough? That’s a healthy development for the industry.

Ultimately, we don’t have to pick a side in “GPUs vs Custom ASICs.” It’s not a religious war, it’s about matching the tool to the job (and to the stage of our product’s life cycle). 

Early-stage, fast-changing project? GPUs will let you iterate without reinventing the wheel. Mature, large-scale service with steady traffic? A custom chip (often accessed through a cloud service) might save you a fortune and give you more control. And CPUs continue to quietly handle a ton of workload where simplicity and ubiquity matter more than raw speed. 

We’re moving toward a world where an AI stack might use all three in different parts: CPUs to prepare and serve data, GPUs to train models and handle complex logic, and specialized silicon to handle the very high-volume, specific inference tasks. 

The winners in this next phase of AI computing will be those who can deliver the most useful work per dollar and per watt at a given reliability, and do so with a software stack their developers can actually live with. In other words: the hype is cool, but practicality wins. And having more choices on the table is going to make the practical path that much easier to find.

Resources:

NVIDIA, wccftech.com, reuters.com, tipranks.com, readthejoe.com

Juan Mulford
Juan Mulford
Hey there! I've been in the IT game for over fifteen years now. After hanging out in Taiwan for a decade, I am now in the US. Through this blog, I'm sharing my journey as I play with and roll out cutting-edge tech in the always-changing world of IT.

Leave a Reply

- Advertisement -

Popular Articles

mulcas.com-Raspberry-Pi

Raspberry Pi OS in a Virtual Machine with VMware

4
Although the Raspberry Pi OS is designed and optimized for the Raspberry Pi module, it is possible to test and use it without its hardware, with VMware. This solution can be useful if you are a developer (or just a curious guy) and don't have a Raspberry Pi module with you
Unable to delete inaccessible datastore

Unable to delete an "inaccessible" datastore

7
I was switching my storage array, so I migrated the VMs from that old datastore/storage to a new datastore/storage. The old datastore was shared by 3 ESXi hosts, no cluster. After migrating the VMs and unmount/delete the datastore, it was still presented in two of the ESXi hosts and was marked as inaccessible.
This is not a valid source path / URL

This is not a valid source path / URL - SourceTree and Gitlab

1
I have been working on a project with a friend who set up a repository in Gitlab but even though I was able to view all projects on it, I couldn’t really join the repository. I was using SourceTree and Gitlab.
mulcas.com-VMware-OVF-Tool

How to export a Virtual Machine using the VMware OVF Tool

9
The VMware OVF Tool is implemented by VMware for easily importing and exporting virtual machines in Open Virtualization Format (OVF) standard format. Here, I want to show you how to download and install it, and then how to use it from a Windows machine.
Couldn't load private key - Putty key format too new - mulcas.com

Couldn't load private key - Putty key format too new

5
couldn't load private key - Putty key format too new.” This issue happens when you use PuTTygen to generate or convert to a ppk key. Here is how to fix it. 
- Advertisement -