Does AM2's Performance Make Sense?
Assuming for a moment that the performance we're seeing here today is representative of what AMD will show off in 2 months, does it make sense? AMD has effectively doubled their memory bandwidth but they've seen virtually no increase in performance, other than in some very isolated situations.
If you'll remember back to the introduction of AMD's Revision E core, we did an article about how the new core brought support four new memory dividers allowing you to run at speeds up to DDR500 without overclocking your CPU or the rest of your system. In that article we looked at the overall performance benefit of DDR-500 over DDR-400 on a Socket-939 platform in a variety of situations. A recap of our performance results is below:
Benchmark | Socket-939 (DDR-400) | Socket-939 (DDR-480) | % Advantage (DDR-480) |
Multimedia Winstone 2004 | 41.9 | 42.7 | 2% |
3dsmax 6 | 2.78 | 2.80 | 1% |
DivX 6.0 | 50.6 fps | 53.2 fps | 5% |
WME9 | 4.22 fps | 4.28 fps | 1% |
Quake 3 (10x7) | 121.9 fps | 127.2 fps | 4% |
ScienceMark 2.0 (Bandwidth)* | 5378 MB/s | 5851 MB/s | 9% |
As you can see, given almost a 9% increase in memory bandwidth, we saw similarly small increases in overall performance. It would seem that the Athlon 64, at its current clock speeds, just simply isn't starved enough for memory bandwidth to benefit from an increase in bandwidth. You'll also see that the areas where faster DDR memory helped back then are pretty much the areas where DDR2-800 is showing gains today.
Based on our results from back then, if a 9% increase in memory bandwidth doesn't increase performance tremendously, then the 35% increase in bandwidth we see with DDR2-800 on AM2 shouldn't yield any more of a performance increase. Or simply put, yes, our AM2 performance numbers make sense.
107 Comments
View All Comments
mino - Tuesday, April 11, 2006 - link
1) 3-cycle L1 on K7/K8 is the fastest required, it goes from the internal structure if the scheduler and the pipeline that 2-cycle chache would do almost no good. Also they would have to reduce L1 size to 32k+32k which would hurt. It simply does not make sense to change L1 at all, maybe on K8L but IMHO 128k+128k would help much more than 2-cycle latency.2) 17-cycle L2 is PRETTY GOOD for 1M L2 with exclusive structure!!! IMHO it is possible to do 16-cycle, maybe 15, but nowhere near Dothan's 10-cycle. Also remember lower-latency L2 has scaling problems (that's why intel made prescott's L2 slower than NW's)
3) Concerning the memory subsystem(caches + memory) (on single-socket K8/K8L) the biggest issue is the robustness(amount of on the fly acceses to memory) and latency of the memory controller. To solve this is not trivial thing. IMHO to add 2-4M L3 with random access ~50 cycles would do.
4) In the >4 sockets front all they need is effective caching of MOESI snoops.
You are also forgot K7/K8 is mostly KISS architecture. It is just wery well balanced so has good performance in the end. However do one wrong change and you are screwed.
KISS == Keep It Simple Silly
About "weak" SIMD implementation on AMD, don't fool yourselves guys. Only x86 architecture faster than K8 on SSE/SSE2 is Netburst aka SIMD-by-intel.
About conroe, ita has twice as wide ALU's and FPU's than PIII/K7/K8, this means it has huge resources at disposal to calculate SIMD.
Same goes for K8L 2 quarters later. That said K7/K8 core has far more FP power than P6 architecture. On FP Conroe and K8 are about aquall.
but K8L will wipe the floor with K8 and Conroe on FP. Conroe will wipe K8 on INT and be still faster than K8L by decent margin.
Overall we are for another PIII vs. K7 battle with single very important change - AMD has a platform it had not back in the K7 vs. PIII days.
fitten - Thursday, April 13, 2006 - link
I find the K8L a somewhat odd strategy. I guess they are targeting the Itanium market because Opterons already have a good part of the HPC market. Given that the HPC people are the ones that really care about FPU performance and that they are still a fairly small market segment, it seems an odd target. Integer performance rules the roost for servers... web, database, and just about everything else you can think of other than number crunching simulations and the like. Desktop uses for FPU are a few like games and some mathmatical stuff. Intel is focusing on integer performance at least as much as FPU with Conroe (Conroe gets a good dose of both), which makes sense to me since so much of the work done on computers, both desktops and servers, is dominated by integer operations. K8L speculation says only FPU horsepower will be added... just doesn't seem like a sound decision to me.Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Furen - Monday, April 10, 2006 - link
I dont believe we will. Even S1 will be dual-channel, and this is what would have benefited the most from being single-channel (since the pincount would be much lower the package could be much smaller).BaronMatrix - Monday, April 10, 2006 - link
Looking at the intensive timing and bus speed tweaks USING the SAME RAM as the latest XE955 article I would have expected the same kind of thing here. Anand doesn't look at lower speed lower latency for whatever chip he used. That RAM will do 3-2-2 at 667. Obviously AMD is more sensitive to latency.ChristTheGreat - Monday, April 10, 2006 - link
AMD is sensitive to latencies, cause of the memory controller. I'm sure that 3-2-2-9 DDR2 from OCZ, would give much more performance on AMD.Again, this is only a CPU that they use to test, so it's not the true CPU. They wouldn't give us the performance it gives before it's launch. That's like killing yourself right now if the performance is poor....
I saw an article, that AMD could be working on DDR2 latencies. You think that 4-4-4-12 is good timings? 12 = tRAS
"tRAS is the time required before (or delay needed) between the active and precharge commands. In other words, how long the memory must wait before the next memory access can begin."
In fact, you have better frequencies, but lower timings.... What you need, is higher frequencies, and lower timings.
So we will have to wait till they launch Socket AM2, to know the true performance of AM2.
defter - Monday, April 10, 2006 - link
4-4-4-12 are good timings, even for DDR2-667. It isn't easy to find reasonable priced DDR2-667 that works on those timing with standard voltage.
Some people forget that 99% of consumers won't be using super expensive overvolted 3-3-3-10 DDR2-800 memory just to get few percents of extra performance. And if you compare AMD CPU + super fast DDR2-800 against Intel CPU (which runs fine on DDR2-667 because of FSB limitation) then you need to take into account higher price of memory on AMD system.
Wesley Fink - Monday, April 10, 2006 - link
We are continuing to test the AM2 on different AM2 boards. On another motherboard we could run at 3-3-3 DDR2-800 with the OCZ PC2-8000 memory. Latency was a bit lower and bandwidth a bit higher, but nothing realy changed from Anand's conclusions. We have also been running DDR2-667 and DDR2-533 tests with this new super fast OCZ memory and cheaper mainstream DDR2 memory, and we will be sharing those results as soon as testing is complete.cornfedone - Monday, April 10, 2006 - link
The crap the mobo companies have been shoving out the doors the past couple years is pure garbage as any number of hardware review sites have confirmed. It looks like the AM2 mobos might be more half-baked crap. Until you can test the shipping CPUs on a quality mobo that allows proper memory timing, it's difficult to know what AMD's AM2 CPUs will or won't deliver. If I had a dollar for every bogus claim Intel has made, I'd be a Billionaire so I wouldn't hold my breath that Conroe will perform as Intel claims.