Why is the ddr on gfx cards always so much higher than for stand alone pc ram?
GDDR memory uses quad data rate to achieve its frequency whereas desktop DDR memory uses double data rate (the DD in DDR originally came from this).
For both GDDR and DDR ,you will notice the actual clock frequency (in Hz/MHz/GHz etc) is relatively low. For instance DDR4 3200 actually operates at 1,600MHz. This is the SDR (single Data Rate) and actual clock speed. It becomes 3,200MT/s
(Mega Transfers/s which is what we often refer to as MHz, which it isn't. CPU-Z reports this correctly showing only the SDR rate) because data is read/written on both the rising and falling edge of the square wave.
For GDDR, a second square wave that's out of phase with the first is used, which allows two falling and two rising edges in a single clock cycle.
So instead of two opportunities for data transfer as per desktop DDR, we have four. Hence the Quad Data Rate nature of GDDR. It isn't called GQDR, because it's still fundamentally double data rate. At the base level GDDR and DDR are the same. 1,600MHz that becomes DDR4 3200MT/s in your PC, would become GDDR 6,400MT/s on your GPU.
The reason we don't or didn't use GDDR as main memory is because, using this Quad Data Rate means we need significantly different data line control, with higher tolerances so it's more difficult to manufacture and costly. Moreover, because of the complex nature of having to clock essentially two waves in this manner every cycle, GDDR has much higher latency. For example instead of C16-18-18-36 for DDR4 3200. It would be C40-66-66-100 for GDDR 6400.
The reason these latencies aren't as important with GDDR is partially related to trace distance (GPU memory is a lot closer to the GPU than CPU is to DIMM socket). More importantly though is that
GPUs are inherently parallel processors and this hides latency penalties, quite well. As in general computing, the most obvious way to hide latency penalties is to extract as much parallelism as possible.
For CPUs where there are plenty of serialised operations, it would end up hurting performance, to which the only remedy would larger caches from L1 all the way up. Low level caches are costly, increases die area, add complexity to chip design and in terms of computational performance/mm^2 are wasteful.
We will have a DDR5 platform for desktop within 12 months. It doesn't switch to QDR but effectively doubles bandwidth for the same clock frequency in a dual channel like manner and with increased SDR rate 1.6GHz to 3.2GHz as opposed to 0.8GH to 1.6GHz limit of DDR4. (Same DDR4 2400 becomes DDR5 4800 with a comparatively small sacrifice in timings)
Hope that helps a bit.