Intel’s Netburst: Failure is a Foundation for Success

In the world of today’s high performance CPUs, major architectural changes don’t happen often. Iterating off a proven base is safer, cheaper, and faster than attempting to massively rework the basics of how a CPU fetches and executes instructions. But more than 20 years ago, things hadn’t settled down yet. Intel made two attempts to…

Sunny Cove: Intel’s Lost Generation

This is going the be the first in a series of articles on CPU architectures. We’re picking up where Real World Tech left off with its microarchitecture deep dives. And we’re going to be doing them with the advantage of 20-20 hindsight, and hardware to test on. Sunny Cove is technically the successor to Intel’s…

Graviton 3: First Impressions

In late May of 2022, AWS released Graviton 3 to the general public. Graviton 3 was the first ARM CPU to introduce the SVE instruction set to a widely accessible server CPU. Before Graviton 3’s general availability, Neoverse N1 dominated the ARM server landscape. AWS’s previous flagship offering, Graviton 2, implements 64 Neoverse N1 cores…

iGPU Cache Setups Compared, Including M1

Like CPUs, modern GPUs have evolved to use complex, multi level cache hierarchies. Integrated GPUs are no exception. In fact, they’re a special case because they share a memory bus with CPU cores. The iGPU has to contend with CPUs for limited memory bandwidth, making caching even more important than with dedicated GPUs. At the…

Examining Centaur CHA’s Die and Implementation Goals

In our last article, we examined Centaur’s CNS architecture. Centaur had a long history as a third x86 CPU designer, and CNS was the last CPU architecture Centaur had in the works before they were bought by Intel. In this article, we’ll take a look at Centaur CHA’s physical implementation. CHA is Centaur’s name for…

Centaur CHA’s Probably Unfinished Dual Socket Implementation

Centaur’s CHA chip targets the server market with a low core count. Its dual socket capability is therefore quite important, because it’d allow up to 16 cores in a single CHA-based server. Unfortunately for Centaur, modern dual socket implementations are quite complicated. CPUs today use memory controllers integrated into the CPU chip itself, meaning that…

Intel Renames Oregon Fab: Gordon Moore Park. Adds +270k sq ft, 18A Node now 2024

If there’s one thing that’s cheaper than building a new fab, it’s expanding an existing one. Intel’s Oregon D1 facility has been the hub of all of its technology advancements over the past 20 years, whereby Intel will trial new manufacturing processes before transferring them to other fabs around the world. As you can imagine,…

GPU Hardware Video Encoders – How Good Are They?

Figuring out the best way to encode a video is very computationally expensive, and it might not a good idea to throw a ton of CPU cycles at encoding video when you’re running a game. That’s why modern GPUs usually include hardware video encoders. Here, we’ll take a brief look at how some implementations compare.…

VIA Part 4 – A Deep Dive into Centaur’s Last CPU Core: CNS

The x86-64 instruction set powers the vast majority of PCs, consoles, and servers. However, the number of x86 licensees has always been small, so it’s important to keep track of the few that are left. While Intel and AMD have been at each other’s throats chasing the performance crown, VIA, the former owner of Centaur,…

SiFive Completes Series F Funding Round: +$175m, $2.5b Evaluation

The three main architectures currently in the ecosystem that people talk about are x86, Arm, and RISC-V. It’s that last one we’re focusing on, as the newest on the block and it is slowly becoming a de-facto option for a lot of simple compute designs as well as designers. The RISC-V model is different to…

State of Windows on Arm64: a high-level perspective

For a port of Windows to another architecture to be usable, a high level of backwards compatibility is needed to delight customers. Customer frustration has to be maintained at a minimum. Otherwise, such an endeavor can get very expensive quickly, with rising return rates hampering profitability across the whole chain among other issues. Start early…

Going Armchair Quarterback on Golden Cove’s Caches

Processor speed has rapidly outpaced advances in DRAM technology, so caching strategy is a huge part of CPU performance today. A couple of articles ago, we saw Intel’s decisions with Golden Cove’s high latency, high bandwidth cache setup. Could they have done better? Let’s admit it – speculation is fun. So here, we’ll speculate on…

Alder Lake’s Power Efficiency – A Complicated Picture

Reviews across the internet show Alder Lake getting very competitive performance with very high power consumption. For example, Anandtech measured 272 W of package power during a POV-Ray run. Our own testing showed eight Golden Cove cores alone could pull over 168W. But that’s at stock settings. And stock settings don’t do Alder Lake any…

Deep Diving Zen 3 V-Cache

This is the deeper dive of AMD’s V-Cache that we teased with our short latency article and we will be covering a little more on the latency front along with the bandwidth behavior of V-Cache and the performance of V-Cache SKUs. A Bit More on Latency If you have read our teaser article on V-Cache’s…

AMD’s V-Cache Tested: The Latency Teaser

If you were like us and were surprised that AMD announced 3D V-Cache back in August and wondered how AMD would be able to pull this off, well, we have a teaser article for you today regarding V-Cache’s latency! The Latency There was much speculation about just how much latency V-Cache would add and the…

Intel’s Tremont: Atom Changes Course

Today we’ll look at Intel’s Tremont architecture to put Gracemont in perspective. It’s Gracemont’s direct ancestor, and represents a shift in Intel’s Atom strategy. It delivers a massive 30% performance-per-clock jump – at least according to Intel – over its predecessor, Goldmont Plus. At the same time, Tremont debuted several techniques that feature prominently in…

Gracemont: Revenge of the Atom Cores

This article can be considered a part 2 to our Golden Cove article because today we are looking at the other core in Alder Lake, Gracemont. Which is in my opinion more interesting than Golden Cove because, spoiler alert, it’s not a little core in the slightest. Gracemont is a 5-wide out-of-order architecture that traces…

Alder Lake – E-Cores, Ring Clock, and Hybrid Teething Troubles

This will be a short post about how Alder Lake’s ring behaves when E-Cores are active. With just P-Cores active, the ring runs at 4.7 GHz. But if anything is running on the E-Cores, the ring frequency drops to 3.6 GHz. This drop happens regardless of whether the E-Cores are accessing L3/memory. That in turn…

Popping the Hood on Golden Cove

Alder Lake (ADL) is the most exciting Intel launch in more than half a decade. For the first time since Skylake, Intel has launched a competitive desktop microarchitecture. But I’m sure you all know that already from other sites that are able to do launch day reviews. Here, we’re going to deep dive ADL’s P-core…

Zhaoxin Part 3: A Sort of Anti-Climax

I’ll be blunt here, this part will seem like an anti-climax compared to Part 2 of this series but I hope to nicely wrap up this series with this as the conclusion piece of what we know about how the changes talked about in Part 2 have affected the performance of the Lujiazui architecture and…

Deep Diving Neoverse N1

Our previous article gave a pretty narrow view of how Neoverse N1 and Zen 2 stacked up, and mostly focused on whether ISA was responsible for performance differences. Here, we’re going to analyze Neoverse N1 in more depth, on both the micro(architecture) and macro level. We’re still going to use Zen 2 as a reference…

Do IBM’s Giant L3 and V-Cache Represent the Future?

IBM showed off a giant 256 MB L3 during its Telum presentation at Hot Chips 2021, and ignited discussion about whether that represents the future of caches. That’s not the first time we’ve seen big caches brought up. Just a few years ago, AMD advertised Zen 2’s 16 MB CCX-level cache as “GameCache” to emphasize…

Analyzing Video Card Efficiency, Part I – Power

While most enthusiasts chase the absolute best performance at any cost, there are a few who use efficiency as their benchmark for a card’s attractiveness. It can be argued that the efficiency of a design – how effectively it utilizes the resources available to it – is a better measure for comparing different graphics cards…

The Weird and Wacky World of VIA, the 3rd player in the “Modern” x86 market

Header Image credit goes to Martijn Boer. In the world of x86 CPUs there are two major players, Intel and AMD. However, there is one (well two but that will be expanded on later) other company that designs and produces CPUs that are fully compatible with modern x86 extensions, yes even AVX, and that company…

Details on the Gigabyte Leak

Recently, a ransomware group leaked data from Gigabyte in an attempt to extort payment. That’s been well covered by other outlets (please everyone, secure your networks), so here we’re focusing on Zen 4 technical details from the leak. Sadly there’s not a lot on fundamental core changes however, there is some stuff we can get…

Neoverse N1 vs Zen 2: ARM in Practice

Previously, we looked at ARM and x86 and concluded high performance designs wouldn’t get a significant advantage by using either instruction set. That article focused narrowly on the respective ISAs, and assumed equal ecosystem and implementation goals. Today, we’ll look at ARM and x86 in practice – specifically Neoverse N1 with a quad core Ampere…

Measuring Zen 3’s Bottlenecks

Zen 3 is one of the fastest CPU cores currently on the market; that isn’t up for debate.However, even the fastest CPU cores have bottlenecks and today we are talking about the bottlenecks that Zen 3 has and what AMD could improve with Zen 4. We gathered far too much data from Cinebench R23, Civilization…

ARM or x86? ISA Doesn’t Matter

For the past decade, ARM CPU makers have made repeated attempts to break into the high performance CPU market so it’s no surprise that we’ve seen plenty of articles, videos and discussions about ARM’s effort, and many of these pieces focus on differences between the two instruction set architectures (ISAs). Here in this article we’ll…

How Zen 2’s Op Cache Affects Performance

Banner image credit goes to Fritzchens Fritz and his amazing die shots Recent AMD and Intel high performance CPUs implement an op cache that remembers decoder output and functions like a L0 instruction cache. Compared to the traditional L1i fetch and decode path, the op cache provides higher bandwidth while saving power by allowing the…

The End of an Era: AMD Discontinues Pre-2016 GCN GPU Support

A small official blog post and a driver release footnote made for an unceremonious end to one of the most loved and long-lived series of AMD GPUs ever, Hawaii, and it’s HBM-powered follow-up, Fiji. Today we try to add a small bit of ceremony back, to recognize GPUs that largely contributed to AMD’s ‘fine wine’…

Exploring CPU Core to Core Latency and the Role that Locks Play

This article has been a LONG time coming since our article on Rocket Lake, where we talked about core to core latency for the first time here on Chips and Cheese. This is a follow up article exclusively about core to core latency. The core to core latency tests used by us and Anandtech measure…

GPU Memory Latency’s Impact, and Updated Test

In a previous article, we measured cache and memory latency on different GPUs. Before that, discussions on GPU performance have centered on compute and memory bandwidth. So, we’ll take a look at how cache and memory latency impact GPU performance in a graphics workload. We’ve also improved the latency test to make it more accurate,…

RDNA 1 Redux: Maximizing Performance With RX 5000 Series GPUs

With 2021’s seemingly endless GPU stock issues, many users are being forced to rely purchasing last generation graphics cards, or—if they are lucky—hold on to what they have, to play the latest games. Often using an older GPU means sacrificing visual quality or resolution—trade-offs that many may find to be unacceptable. Overclocking can help of…

Measuring GPU Memory Latency

We’ve gotten used to measuring CPU cache and memory latencies, so why not do the same to GPUs? Like CPUs, GPUs have evolved to use multi-level cache hierarchies to address the growing gap between compute and memory performance. And like CPUs, we can use pointer chasing benchmarks (in OpenCL) to measure cache and memory latency.…

Rocket Lake: When ‘Reviews’ are Really Previews

As we’ve all surely seen by now, there are early retail samples in the wild for Intel’s 11th-generation Core desktop CPUs, ‘Rocket Lake-S’, and early reviews to go along with them. Pre-NDA reviews have said good things, bad things and ugly things. What’s not being said (or just not being disclosed) is that they’re running…

Lowering the BAR: AMD’s 6700 XT launch and the Importance of Disclosure

On March 3rd, 2021 AMD officially announced the 6700 XT to the world. Along with it came the usual first-party performance graphs, most of which showed it matching or even beating NVIDIA’s RTX 3070—which was a shock to the tech world. It would seem, however, all is not as AMD would have you believe. AMD’s…

Modern Data Compression in 2021 Part 2 : The Battle to Dethrone JPEG with JPEG-XL, AVIF, and WEBP

This is the 2nd article of a multi-part series, with the focus starting on image compression. I would heavily recommend reading the 1st article as it explores a part of the history of image compression, which leads to talking quite a bit about the basis of the JPG image codec: https://chipsandcheese.com/2021/01/30/modern-data-compression-in-2021-part-1-a-simple-overview-on-the-art-of-image-encoding/ Essentially, most modern image…

Analyzing Zen 2’s Cinebench R15 Lead

Cinebench R15 (CBR15) is a popular benchmark based on Cinema4D’s 3D rendering engine. It can utilize all available CPU threads, but here we’ll be analyzing it in single thread mode. In short, Zen 2 pulls ahead thanks to its superior branch predictor, larger mid-level cache, and ability to track more pending floating point micro-ops in…

A Patreon Story

Hello everyone, It’s time to talk about something important, as it’s becoming increasingly clear to us. Having a website with quality content and being able to deliver it effectively isn’t free. We’ve shunned as much advertising as possible to keep our content free of possible influence. The flipside of that is the site and content…

CTR Safety, Revisited

There are times when being a journalist is exciting. Your team writes something important, people engage with it and it generates a large response. Unfortunately, those are often the exact situations where emotions run high. If you write glowingly about how great a product is, the competition cries foul and suddenly you’re biased. If you…

Security and You, an Overview

Intel, AMD, and Nvidia have all had brushes with poor security in the past (widely published or otherwise) but that isn’t the focus of this content piece. Instead, we will be exploring what the three companies have—or in one case haven’t—done to secure their products from the threats present in the world and on the…

AMD’s Past and Future CPUs (Updated with Author’s Note)

Less than half a decade ago, if you had walked up to someone in the industry and said that in 5 years’ time AMD would have the fastest CPUs, you would have been laughed straight out of the room—but here we are. At the time of writing, AMD does have the fastest CPUs on the…

CTR: A Review and a Warning (updated)

Update 5/2/21 1340 GMT: 1usmus himself has replied to our findings; we have included his reply and some points after the conclusion at the end of the article. Article has been edited for clarity. What is CTR? Clock Tuner for Ryzen (CTR) is a tool created by Yuri Bubliy, a.k.a. 1usmus, that is designed to…

Intel Finally Solving the Right Problem

When I first heard that Intel was replacing their current CEO, Bob Swan, my first thought was “rearranging the deck chairs on the Titanic”. After all, you don’t get into a situation like Intel’s without some persistently wrong management choices. The board, despite being filled with some proven intelligent people, doesn’t understand Intel’s technical details,…

Intel’s HEDT Roadmap

Long the king of High-End Desktop (HEDT) computing, Intel’s once undisputed position has become a much more awkward one since the arrival of a resurgent AMD and its Threadripper CPUs. X299 is Dead! The header says it all: X299 is dead, at least in the original form that Intel launched it. Available for pre-order on…

NVIDIA’s Next Generation GPU: MCM?

Recently, we’ve heard rumors that NVIDIA Corporation is targeting a MCM design for their next-generation server GPU. While future product details are scarce and volatile, we can guess at Nvidia’s aims by looking at their research. A 2017 paper co-authored by Nvidia lays out the goals and challenges of MCM GPU design. Sidestepping Moore’s Law…

NVIDIA’s Enterprise

Well, I guess that this will be the first non-welcome post on this website and the topic of this piece is NVIDIA Corporation, specifically the next-gen server GPU and more broadly the NVIDIA Enterprise division. Now please remember that what is written below are leaks, rumours, and speculation, so take this with an appropriate amount…

Welcome to Chips and Cheese

I am pretty sure that I’m not the first person to say that 2020 has been a bad year. A bad year is probably a gross understatement. But even with all that has happened in the world outside of tech, there have been quite a few disappointments and controversies within the tech sphere over the…