Memory technologies make fundamental tradeoffs between storage capacity and read speeds, and these drive programmers to put different data on different memory technologies.
But what actually are those tradeoffs? Nobody ever talks about them. Here is a graph of read speeds (in MB/s of sequential reads) vs capacity (in GB) per dollar.
Why did I not include SRAM?
because bandwidth is only costly, and therefore something we care about, when it crosses chip boundaries. Even without trying to optimize this metric the A100 has something like 400TB/s of SRAM bandwidth from the registers cache.
When making this graph, there are a lot of tricky modelling assumptions and subtleties that I've simply had to bulldoze. Keep this in mind before trusting these numbers too blindly.
Why ever use HBM?
Part of the purpose of making this graph was for me to show it to people and ask this question. The basic answer as far as I can tell is that HBM takes less shoreline (area on the edge of a chip). This means you can fit more memory bandwidth on a single chip, which reduces communication requirements, which is even MORE expensive than HBM. It also uses less power, but I'm pretty unconvinced this is meaningful.