Re: Large (8M) cache vs. dual-core CPUs

Ron Peacetree <rjpeace@xxxxxxxxxxxxx> · Wed, 26 Apr 2006 08:40:37 -0400 (EDT)

I'm posting this to the entire performance list in the hopes that it will be generally useful.
=r

-----Original Message-----
>From: mark@xxxxxxxxxxxxxx
>Sent: Apr 26, 2006 3:25 AM
>To: Ron Peacetree <rjpeace@xxxxxxxxxxxxx>
>Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
>
>Hi Ron:
>
>As a result of your post on the matter, I've been redoing some of my
>online research on this subject, to see whether I do have one or more
>things wrong.
>
I'm always in favor of independent investigation to find the truth. :-)

>You say:
>
>> THROUGHPUT is better with DDR2 if and only if there is enough data
>> to be fetched in a serial fashion from memory.
>...
>> So PC3200, 200MHz x2, is going to actually perform better than
>> PC2-5400, 166MHz x4, for almost any memory access pattern except
>> those that are highly sequential.
>...
>> For the mostly random memory access patterns that comprise many DB
>> applications, the base latency of the RAM involved is going to
>> matter more than the peak throughput AKA the bandwidth of that RAM.
>
>I'm trying to understand right now - why does DDR2 require data to be
>fetched in a serial fashion, in order for it to maximize bandwidth?
>
SDR transfers data on either the rising or falling edge of its clock cycle.

DDR transfers data on both the rising and falling edge of the base clock signal.  If there is a contiguous chunk of 2+ datums to be transferred.

DDR2 basically has a second clock that cycles at 2x the rate of the base clock and thus we get 4 data transfers per base clock cycle.  If there is a contiguous chunk of 4+ datums to be transferred.

Note also what happens when transferring the first datum after a lull period.
For purposes of example, let's pretend that we are talking about a base clock rate of 200MHz= 5ns.

The SDR still transfers data every 5ns no matter what.
The DDR transfers the 1st datum in 10ns and then assuming there are at least 2 sequential datums to be transferred will transfer the 2nd and subsequent sequential pieces of data every 2.5ns.
The DDR2 transfers the 1st datum in 20ns and then assuming there are at least 4 sequential datums to be transferred will transfer the 2nd and subsequent sequential pieces of data every 1.25ns.

Thus we can see that randomly accessing RAM degrades performance significantly for DDR and DDR2.   We can also see that the conditions for optimal RAM performance become more restrictive as we go from SDR to DDR to DDR2.
The reason DDR2 with a low base clock rate excelled at tasks like streaming multimedia and stank at things like small transaction OLTP DB applications is now apparent.

Factors like CPU prefetching and victim buffers can muddy this picture a bit.
Also, if the CPU's off die IO is slower than the RAM it is talking to, how fast that RAM is becomes unimportant.

The reason AMD is has held off from supporting DDR2 until now are:
1.  DDR is EOL.  JEDEC is not ratifying any DDR faster than 200x2 while DDR2 standards as fast as 333x4 are likely to be ratified (note that Intel pretty much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology.  Guess who has more pull with JEDEC?)

2.  DDR and DDR2 RAM with equal base clock rates are finally available, removing the biggest performance difference between DDR and DDR2.

3.  Due to the larger demand for DDR2, more of it is produced.  That in turn has resulted in larger supplies of DDR2 than DDR.  Which in turn, especially when combined with the factors above, has resulted in lower prices for DDR2 than for DDR of the same or faster base clock rate by now.

Hope this is helpful,
Ron