AMD Reveals EHP with 32 Zen Cores and Greenland HBM2 Graphics

本帖最後由 Kundera 於 2015-8-2 08:21 編輯

32 Zen core APU
SMT (2 threads?)
32GB HBM2

A half sized consumer variant would be interesting enough (if AMD can make it on time? )



AMD Reveals the Monsterous ‘Exascale Heterogeneous Processor’ (EHP) with 32 x86 Zen Cores and Greenland HBM2 Graphics on a 2.5D Interposer

There had been rumors about AMD working on a huge APU with Zen cores and Greenland HBM graphics, something that AMD had hinted upon in its official roadmap. However, it has (finally) officially revealed details about the upcoming APU in a paper submitted to IEEE (Institute of Electrical and Electronics Engineer). The APU, dubbed an “Exascale Heterogeneous Processor” or EHP for short is the mother of all APUs with 32 Zen Cores, an absolutely huge Greenland graphics die and upto 32 GB of HBM2 memory – all on a 2.5D interposer.

Exascale Heterogeneous Processor (EHP) is AMD’s promised monster APU for the HPC segment

The research in question can be found over here and requires paid access, however we were able to get the relevant piece courtesy of Bitsandchips.it. As you may notice, the diagram is a very simple block diagram that doesn’t really reveal much, except the number of CPU cores. Fortunately for us, AMD’s roadmap and the relevant knowledge of HBM technology makes it almost child’s play to identify the exact parts.  
AMD EHP APU 32 Zen Cores Greenland HPC

Provided the diagram is drawn accurately, the first thing you will note is that there are exactly 32 “CPU Cores”. Since the EHP (APU) is scheduled for 2016-2017, we are most definitely looking at Zen cores (not to mention Excavator cores wouldn’t fit). I can count 4 dies per stack, and since we are dealing with HBM2 at the very least (given the timeframe), these constitute 4-Hi stacks. HBM2 is 8Gb per die, which equates to 4GB per stack (for a 4-Hi stack) in this diagram or a total of 32 GB HBM2 memory onboard the interposer. That’s not it either, memory can be expanded further via the DDR4 channel present on package.

As far as the graphics portion of the Exascale Heterogeneous Processor is concerned, what we know for sure is that this will be the next generation Greenland graphics, what we don’t know is how much the exact core count will will be. Since we have no idea how big Zen Cores actually are (or if the diagram is even drawn to scale) it would be unwise to try to reverse engineer the die size of the GPU from the picture. We can safely say however, that this is one of the hugest GPU dies we have encountered so far. If I were to make a wild guess (caution: speculation) for the sake of giving a number I would say the number of stream processors could easily be above 3072 considering we are talking about a lower process and a huge die.

This brings us to our third deduction. A 2.5D interposer has been used in the EHP (APU) and the CPU and GPU cores all togethery are too numerous (and huge) to have been manufactured as a single die. Not only would the yields on such a monstrosity be beyond imagining, it would be pretty impossible to manufacture such a thing in the first place. The likely conclusion is therefore, that the two compute and graphic portions of the APU are manufactured separately and put together on the interposer later on in assembly (possibly at UMC’s Fab 12 foundry in Singapore, which is already used to assemble Fiji dies). So basically, AMD is fabricating the compute side of the Scale Heterogeneous Processor (EHP) in dies with 16 Zen cores each, for a total of 2 computing and 1 graphics die assembled on the interposer (ignoring the HBM).

Now there has been word on the rumor mill about AMD’s HPC APU for quite a long time and we are fairly certain there will be a 16 core variant as well. Previous leaks have indicated that the processor will be constructed using AMD’s Coherent Fabric – which so happens to be a custom interconnect for the purpose of the cores communicating with the Greenland graphics. Each Zen core will have access to 512KB of L2 cache and 4 Zen cores will share 8MB of L3 cache in the ‘Exascale Heterogeneous Processor’. That equates to a grand total of 16MB L2 Cache and 64 MB L3 cache. Each Zen core will be capable of running two threads (thanks to the company’s shift back to SMT) for a total of 64 threads in this huge APU. The processor is thought to have 8 DDR4 channels with a capacity of 256GB per channel.

Unfortunately for the enthusiasts, there is no guarantee that the EHP will trickle down into consumer variants. Infact, I will be genuinely shocked if it does. Even the 16 core variant that was spotted quite a bit earlier would be hard pressed to enter the mainstream segment. In any case, Heterogeneous Processing is an applause worthy approach to handle the HPC problem. Equipped with Greenland class stream processors for parallel workloads and a small army of Zen cores for the rest, this not-so-tiny APU would be handle just about anything. Not to mention, as the name suggests, the Exascale Heterogeneous Processor is built to be scaled to kingdom come, allowing for a truly powerful rival to Intel’s Xeon Phi coprocessors and even the general GPGPU market.

Source:
http://wccftech.com/amd-exascale ... hbm2/#ixzz3hbxwzt9W

Related article:
http://www.theplatform.net/2015/ ... xascale-revive-amd/

強大的PPT又出現了

TOP

呢單野好似好耐下喎.
如果真係出左咪勁大粒

TOP

功耗幾多先.......

TOP

本帖最後由 Kundera 於 2015-8-2 17:59 編輯
功耗幾多先.......
VADER 發表於 2015-8-2 15:38


look at how intel steroided their iGPU by just adding 128MB eDRAM
putting 1gb hbm2 for consumer apu (let's say 6-8 core)will be an overkill (again, if they ever ship it)

TOP

look at how intel steroided their iGPU by just adding 128MB eDRAM
putting 1gb hbm2 for consumer ap ...
Kundera 發表於 2015-8-2 17:55


The 128MB eDRAM is way faster than 1GB HBM2

TOP

32 Zen core APU
SMT (2 threads?)
32GB HBM2

A half sized consumer variant would be interesting enoug ...
Kundera 發表於 2015-8-2 08:18


If single die, gg

TOP

The 128MB eDRAM is way faster than 1GB HBM2
qcmadness 發表於 2015-8-2 18:09



    it's about 50GB/s bi-directional isn't it?
though the latency could be lower than hbm

TOP

it's about 50GB/s bi-directional isn't it?
though the latency could be lower than hbm
Kundera 發表於 2015-8-2 18:12


60ns, 50GB/s per direction

HBM: 128GB/s per stack

TOP

60ns, 50GB/s per direction

HBM: 128GB/s per stack
qcmadness 發表於 2015-8-2 18:18



so eDRAM is on par with HBM1, am I get it wrong?

TOP