Oh Intel…


2020 has been quite a year so far, and that’s not even touching on world events! What started in 2018 as a traditional mass media hype train event, following a non-trivial security disclosure that actually called 30 years of x86 architecture into question, effectively disappeared in a flash. Yes, this is a reference to Spectre and Meltdown; the disclosure that rocked the technology world and made Speculative Execution, and hardware memory protection, two of the defining characteristics of advanced x86, seem like architectural failures. The industry quickly sprang to action (both PR and engineering) and a set of mitigations ranging from OS patches to BIOS updates to microcode changes to new CPUs which basically just incorporate the BIOS updates and OS patches into hardware. At first, the fear was that impact to performance was going to be massive, then everyone decided it was a “meh” (notable exception, Phoronix) and there was no impact at all really. But is that true?

Well in a word no. Not at all. Not even remotely true. It’s kind of sad that the tech press has decided to simply just move on, and hasn’t even included targeted vulnerability mitigation impact assessment into their test suites for new CPUs, but hey, benchmarking Tomb Raider for the 400 millionth time is what the people want! Meanwhile, the vulnerabilities keep coming with Intel now being subject to a menu of issues that all require mitigation.

So what is the deal with all of these “mitigations” and what are they really about? In a nutshell, most of them require either a hardware change or a BIOS update. Absent those, the OS can in some cases provide mitigations. The thing to note is, once the BIOS is updated (or new hardware is bought obviously), nothing you do in the OS is going to matter. Whatever performance impact a given mitigation causes, once you update your BIOS, you’re going to feel it. If you buy a new Intel CPU with “built in mitigations” you’re going to feel it. Disabling mitigations in Windows will not matter.

With all of that said, what is the actual impact? Making it very simple, any operation which results in lots of context switching by the CPU (moving from kernel to user mode off and on) is going to get hit. Virtualization? Forget it. Big hit. But hey, I don’t run a desktop hypervisor, you say! Well guess what else gets a big hit? Random IO. And random IO is normally a source of pain. So how much of a hit? Let’s observe a worst case; Optane 900P, a part which can really push small random. First, in all of it’s backdated BIOS, Windows mitigations off, “barely any hardware mitigations onboard” 9900K glory:

Dem 4k random Q1T1s doe!

Still pretty impressive, even in the PCIe 4.0 era! OK, well with perf like that, how bad could the mitigation really be? I mean we’re only talking about adding extra checks to Speculative Execution, and context switching, and hyper threading and… uh… let’s just take a look:

Where’d mah 4Ks go?!

For those keeping score at home, that’s a 30% hit to 4k random reads at Q1T1. To accomplish this awesome performance hit, you need to either update the BIOS on the Asus Maximus X Code to any rev past 1901, or run an updated version of Windows 10 (or both).

Ok so that’s Optane. Regular SSDs probably aren’t so impacted right? Well sort of. Next up is the impact on a nice RAID 0 array of Samsung 970 Pro 1TB NVMe. On the left is pre-mitigation, on the right is mitigated. Nowhere near as catastrophic as Optane, but still around 10%. Obviously this means that the performance hit scales with the volume of context switching activity. Higher IOPs of Optane means bigger hit and that’s… Not great.

So what do you need to do to avoid this performance hit? Well… First you need to not install any BIOS updates that started including the mitigations. Next you need to run InSpectre, and disable the Windows mitigations (or just edit the registry) You also need to probably avoid buying newer and newer Intel CPUs that incorporate more of these fixes directly into microcode since so far the Intel microcode changes don’t really seem optimized to attempt to reclaim perf, which makes sense given we’re still technically on Skylake. And of course if you’re already on a chipset rev post all of this, where the base BIOS already had the mitigations built in (Z390+) then you’re stuck unfortunately.

Or… maybe it’s time to reward AMD. After all, AMD has proven to not have any of these issues, beyond the initial Speculative Execution troubles of the Spectre family, so perhaps the reason AMD has “lagged” in IPC all of these years is because they actually weren’t cutting corners? Food for thought