You will not want DDR4 underneath for majority of HPC workloads that are either memory BW bound or compute bound. The only workload that require big memory is memory latency bound, and those are quite a minority in HPC, although more frequent in Web/IDC.
What does this mean? Can somebody elaborate? Does it mean that DDR4 is not well suited for HPC workload?
From my limited understanding, I think he’s trying to point out that most HPC applications are more limited by memory bandwidth as opposed to memory latency. DDR4 has very low latency when compared to HBM, but HBM on the other hand has much higher bandwidth when compared to DDR4 (typical stream benchmarks for HBM on mic-knl would be ~400GB/s which for DDR4 it would be ~80GB/sec). You’ll also find HBM more prominently featured on processors/accelerators with higher core/execution unit counts viz. GPU’s, Xeon Phi, the upcoming 64-core a64fx, etc to utilize the extra bandwidth.
For an explanation of why most sparse linear algebra workloads (the backbone of PDE solving) are bound mainly by memory bandwidth, see chapter 14 of https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf. Staying in the context of PDE solving, you can also have a look at this paper : https://dl.acm.org/citation.cfm?id=3322813 which compares the throughput of a multigrid solver on a CPU vs many-core-CPU vs GPU where the CPU with DDR4 has a high throughput for small problems but the many-coure-CPU and GPU with HBM have much higher throughput at larger problem sizes.