A Deep Dive Into The Qualcomm Snapdragon X2 Elite SoC Details

Date:

Share post:

This is Part 3 of a series. Part 1 can be found here,  Part 2 can be found here, and Part 4 will be posted tomorrow.

Hexagon:

Next up is the Qualcomm Hexagon NPU which supplants the Qualcomm Hexagon Processor, which itself supplants the Qualcomm Hexagon DSP. This isn’t to mock the unit, it looks to be a really solid piece of work, just one that has evolved over time with significant new functionality. OK, we are actually mocking the marketing people but wouldn’t you? Yeah we thought so. Lets take a brief look at the evolution of the Qualcomm Hexagon in the SDX2.

Snapdragon X2 Hexagon Evolution

Hexagons of the distant past

There are three parts of the Hexagon, the scalar unit, the vector unit, and the matrix unit, all of are optimized for scalar, vector, and matrix operations respectively, hence the names. Bet you thought we were going somewhere else with that one, right? But seriously the name change from DSP to Processor to NPU came with the additions of the vector and matrix units. The mathematically literate among you will be sure to notice that there is a difference of one in the year numbers, why is the 2025 Hexagon not on this chart?

Snapdragon X2 Hexagon NPU History

The new Hexagon is more

Yes the 2025 Hexagon is still called an NPU but internally it is NPU6, a new architecture. There isn’t a major change from the high level view, it is just more, more threads, more capabilities, and wider just about everything. Before you start counting blocks in a blurry picture, save your time, we will go through the numbers in a bit. The Hexagon also has it’s own command processor and runs it’s own realtime OS just like the GPU.

Now for the numbers. It all starts with the Scalar unit which as you can see above has traditionally supported six threads. It now supports 12, and yes the image above shows all of them clearly-ish. Qualcomm says Hexagon can do SMT across six threads which means there is a second ‘core’ and each supports half the threads, but that is mostly semantics from a user point of view. Each thread is a 4-wide VLIW architecture, and supports multi-level branch prediction, user-mode DMA to prevent latency and overhead from switching modes, and hardware synchronization.

There are two master ports, basically one for each ‘core’, and the block can do 64b DMA because of the rather large models we are dealing with now. Interestingly while the Hexagon X2 can do 64b DMAs, the cores are still 32b internally just like the X1 which could only do 32b DMAs. That is going to take a bit of interesting math but it works so, well, it works. But why do you need 2x more wider scalar threads with a claimed 143% more throughput?

Snapdragon X2 Hexagon Vector Unit

Hexagon doubles the vector thread count

Because the vector unit now has eight threads, basically one scalar thread to control each vector thread. Control may be a bit of a strong word, it feeds it and gently shepherds it to a mutually acceptable and beneficial numerical conclusion. Basically one scalar thread per vector thread plus four. Why four? Obviously the six threads of previous generation was enough for it’s four vector threads plus two. There is a reason for this and we will come to it later.

Back to the vector unit, each of the engines can now process four 128B, big B, SIMD vectors each and does FP8 and BF16 on top of what it did before. Gasp I hear you gasp, that is a lot of data to grind through a cycle, and you would be right. Qualcomm claims a +143% vector throughput increase this generation which matches the scalar unit increase exactly. This one is not coincidence, and we won’t make the obvious joke here.

That brings us to the last big bit, the matrix unit. The X1 had a matrix unit too but this one is bigger and more capable too. This new version supports 2-bit weights but does not support the FP2 data type like Intel GPUs. No that isn’t a joke, I cracked it to an Intel architect and he pointed out that it is actually supported in hardware. Egg on face there, eh, but I have a new and better joke about 2-bit weights that I will only tell in person. Anyway the matrix unit has it’s own weight and activation caches because not doing so would blow out it’s efficiency. It is on a separate power rail too, and can access the vector tightly coupled memory directly.

Back to scalar threads, the matrix unit needs one as well, probably two to be honest, and that leaves two or three to do all of the other work the Hexagon needs to do. In the last architecture it was 6:4:1 scalar:vector:matrix threads, now it is 12:8:1 but everything is bigger, more and faster. In short the scalar threads are mostly used to control the wide math units downstream from them. Plus at six threads per scalar core, if you wanted to do 10 you would need to do a lot of work on the Scalar units for no real gain.

Snapdragon X2 Hexagon Resource Utilization

Hexagon resource utilization by AI model

If you look at the graph above, you will get eye strain. Should you want to avoid this, we can explain it quite simply. The X axis is a distribution of 300 oft used model types, the Y axis is the percentage of time the Hexagon is waiting on a particular unit. As you can see the Matrix unit is used most but the others are well represented too. Every architect wants to minimize waits in their device which is why Qualcomm built in the flexibility in the scalar and vector units, if one isn’t being utilized, those resources can be pointed in another direction.

To feed all this there is +127% more bus bandwidth to the Hexagon, bigger L2 caches, and a more powerful DMA unit. It also has it’s own memory processor so it can kind of grind through long complex jobs with minimal CPU supervision. If you think about how complex some of those matrix and tensor operations can be, you don’t want to have a CPU micro-managing the entire process from afar do you? You want as much of the work the Hexagon does to be a closed loop, fire and forget type of affair. And it is, say, ‘thank you scalar threads’. And it is faster, a claimed 80TOPS vs the 45TOPS in the X1. Mission accomplished, now if only there was useful software for it, but that isn’t Qualcomm’s fault.

Guardian:

Next up is the Qualcomm Guardian technology. It is the hardware management that the X1 systems lacked, Microsoft and OEMs lied about, and enterprises needed. This is a good thing, and a very bad thing, depending on what parts of it you look at. Whatever the case it can locate and track your PC, lock and wipe it, and manage it. Basically Intel vPro with the addition of a cellular modem, which is a good thing. And very bad too. Plus it is locked to Windows so insecurity is mandatory.

Guardian looks good on paper

The good is obvious, these are desperately needed features for any enterprise, anyone who buys a device without the management side for a fleet is, well, incompetent. Qualcomm knew this and has addressed the problem with the X2, and looks to have addressed it well. The only thing we will point out is that the services Qualcomm offers have no compatibility with anything any sane enterprise has currently deployed. If you buy an X2 device, you will have to gift your IT ops with a brand new secondary console to manage a subset to their fleet. They love that kind of thing, just ask them. That said it is far less of an issue than not having any way to do basic device management, so lets call it a net good thing here.

The bad is twofold. First if you have Guardian, you need either a 5G modem, X75 for the SDXE2, or a 4G IOT modem for basic messaging and GPS, WiFi is supported but it lacks location. Fair enough. Guardian communicates with a Qualcomm server and back end for obvious reasons, you need a common point of contact to relay messages. Guardian also works over WiFi, RFC 1149 (and we presume RFC 2549 but it wasn’t explicitly stated), and nearly any other carrier that can get the packets to the back end, but as mentioned earlier, some features may be lacking without a cell modem. This is a good thing right?

Not in SemiAccurate’s opinion. Why? Two big issues, cost and having a third party involved in your security. Lets look at cost first, starting with the modem. An added modem adds cost but you can pick the really ‘cheap’ 4G IOT modem if you don’t want a 5G modem in your PC. Given the uselessness of 5G in a Windows PC, (Note: Qualcomm people, you don’t pay for your service directly, most of humanity does and it is just too overpriced for mere mortals. That nicely explains how sales of 4G/5G PCs have NOT taken off in the market.) the 4G modem, or better yet no cellular modem is probably a better bet. Basically why have an always on attack surface on your PC, a Windows box no less? It is just dumb and because of what it does, you can’t turn it off. No that isn’t a joke, if you could turn it off, it would kinda defeat the purpose of lock and wipe for a stolen laptop.

But on top of that, the back end costs money to run, and Qualcomm expects to be paid for that. Fair enough, they do the work, they deserve the money. Or the market failure, but lets be glass half-full people here. The problems is that the business model is that a fee of about $20 a year will be charged to the OEM, not the user. The OEM will roll it into the price of the system, plus the cost of the modem.

The user pays for it whether they use it or not. And they are vulnerable because you can’t turn it off. So you can’t say no, you are upcharged a fairly large chunk of the BoM, and you have no say in it. Assuming the X2, unlike the X1, actually works this time around, this is a deal breaker, a bad idea at a high cost. Lets assume three years up front, so $60 to the OEMs, multiplied by whatever margins they take, and you have a double digit percentage of your MSRP for forced security holes.

What is a nice idea on paper should be avoided at all costs at the checkout line. If a system has a Guardian logo on it, just say no. Intel’s vPro doesn’t have these issues, and their PCs actually work right. But back to the half-full thing, at least Qualcomm knows there is a problem and is trying to do something about it. Yay? Want to bet high end X2 Elite Extreme systems will only come with a modem and the attendant security flaws?

Then there is the deal breaker, Microsoft’s Pluton ‘security’ block. All we can say is that a 3rd party remotely accessible CPU that can snoop any part of your PC silently is unacceptable. The fact that it can be arbitrarily updated with, well, anything, is also unacceptable. Qualcomm did not comment on the block and Microsoft lies about it through omission, at least according to three sources SemiAccurate talked to who have implemented the block in released products. Intel did it right, AMD and Qualcomm didn’t, and so any device that has Pluton from those vendors should be treated as insecurable and vulnerable. Throw in the aforementioned always on cell modem and you have a party!S|A

This is Part 3 of a series. Part 1 can be found here,  Part 2 can be found here, and Part 4 will be posted tomorrow.

The following two tabs change content below.

Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate

Source link

spot_img

Related articles

State-backed spyware attacks are targeting Signal and WhatsApp users, CISA warns

CISA, the US Cybersecurity and Infrastructure Security Agency, has issued a new warning that cybercriminals and state-backed hacking...

Best Practices for Managing Online Event Registrations

Managing registrations for large-scale events takes far more than spreadsheets and manual processes. If you are running conferences,...

Fragments Nov 19

I’ve been on the road in Europe for the last couple of weeks, and while I was there...

Best Black Friday Deals 2025: We’ve Tested Every Item and Tracked Every Price

Happy Thanksgiving—it’s a great day for food, family, and football (not necessarily in that order), and it means...