It's no exaggeration to say that Zen saved AMD. The completely redesigned CPU architecture, launched in 2017, wiped out all the company's past mistakes, along with chiplets, Infinity Fabric and 3D V-cache, providing profitability and market-leading performance Since then, there have been several big steps in chip design, but with the newly announced Zen2024 at Computex5, AMD has made a big difference. There is no need to worry about it.It is just a collection of small changes that combine to bring some remarkable performance improvements in the right application.
Let's start with what hasn't changed. Zen5 is basically the same as Zen4, with Ccd (Core Complex Dies) consisting of up to 8 cores and sharing 32MB of L3 cache. IOD (Input/output Die) is practically the same, but so far few details of the feature set have been released. So in Zen5, you are not getting cores, threads, or caches as compared to Zen4. That may disappoint some people, but given that there was little to complain about previous architectures, there was not much need for drastic changes to be implemented.
However, the new one is deep inside the core structure itself, and the Zen5's announcement to get more data where it needs to be, while improving the overall efficiency of the CCD's numerical computation capabilities is very light in specifics, but here we were told.
The branch prediction units for each core are tuned to be more accurate and spit out results more quickly. This circuit determines which instruction the core is most likely to be on the next line to process, and the rest of the core fetches the required data and instructions from the cache based on what the predictor calculates. If it gets wrong, a valuable cycle is wasted on getting the right information.
Thus, more accurate and faster-responding branch predictors are complaining or criticizing there because they mean better overall efficiency on the processing side of the chip.
The Zen5 also features a "wider pipeline and vector", but without knowing exactly what AMD is referring to here, it can be compared to the Zen4. But in this case, AMD probably has more pipelines (or ports to use the right terminology) in the Zen5, more instruction scheduler, and each co
which means that the Zen5 also has a "deeper window size" and almost certainly has a reorder buffer (ROB). It is aggregated with a statement that it is pointing to. This is between the core front-end (responsible for decoding instructions, renaming, dispatching micro-operations, etc.) and the execution stage that contains the pipeline.
The task of the reorder buffer is to track which instructions are currently being processed in the pipeline until the pipeline is finished. . AMD has increased the size of the rob over the generations of Zen, so it's not surprising to see it grow in the Zen5.
Without this improvement, the "wider pipeline" change would not be so effective, as larger Rob helps prevent pipelines from being idle. This is all good news for performance, but I assume that the rest of the core is improving as well.
AMD's other statements suggest that this is the case, emphasizing that Zen5 has up to twice as much instruction bandwidth on the front end and has the same increase between the L1 data cache and floating point (aka vector) pipelines, and between the L2 and L1 caches.
The phrase 'up to' is a bit odd. Typically, if you increase the width of the data bus by 2x and keep the clock speed the same, the effective bandwidth is 2x. Perhaps marketing is talking to prevent someone from pointing out that it's not exactly 2x, but who knows at this stage.
Anyway, up to 2x "AI and AVX-512 throughput" is particularly welcome for professionals who do serious content creation, AI routines, and other tasks involving mega-heavy vector manipulation. The Zen4 was expected to improve here, as it is the first architecture to support AMD's AVX-512 expansion, but if the speed of such operations is increased by 2x, the Zen5 will wrap up in the pro market.
AMD's Zen5 announcement is not quite clear, with only 2 slides on core changes, and nothing is said about the input/output die. You have to assume that this hasn't changed, but something needs to be updated, given that the Ryzen870 series of X9000 motherboards support higher RAM speeds than the X670 7000 series of motherboards.
More importantly, if you have performance gains of up to 4% in Blender compared to Zen3 at the same clock speed, it's clear that improvements are very important.
The game is much less sensitive to increased IPC (instruction per cycle) because it relies on cache and memory, but the League of Legends is more likely to be a better game. Even with a 21% higher frame rate in Legends and 10% in the Far Cry6, it won't be lightly dismissed.
It's unfortunate that the architecture details are being postponed for future events, but at least you don't have to wait long for independent analysts to examine AMD's claims.
.
Comments