Linux and L2 Cache; Sempron vs. Athlon
by Kristopher Kubicki on August 18, 2004 2:29 AM EST- Posted in
- Linux
As AMD rolls out its newest Sempron processor line, many readers are asking us if the reduced cache Socket 754 Sempron 3100+ really compares with already shipping Athlon 64 single channel solutions. Today we take two single channel, 1.8GHz processors with differing L2 cache and compare them in the same Linux benchmarks we have used in the past. The Athlon 64 2800+ and the Sempron 3100+ are nearly identical processors, except for the 256KB cache difference. There is also a $20 delta between the two retail products, so today we decide if the $20 difference between the two processors is worth the sacrafice of level two cache and 64-bit addressing. We have provided benchmarks of another 1.8GHz 32-bit processor from AMD, as well as the Athlon 64 3000+ for reference only.
Update: This article got pushed live prematurely. If you read it before 12PM EST on the 18th, you read an incomplete, unfinished article.
Performance Test Configuration | |
Processor(s): | AMD Athlon 64 2800+ (130nm,
1.8GHz, 512KB L2 Cache) |
RAM: | 2 x 512MB PC-3200 CL2 (400MHz) |
Memory Timings: | Default |
Motherboard: | Chaintech ZNF-250 (nForce3,
Socket 754) DFI NFII Infinity (nForce2, Socket 462) |
Operating System(s): | SuSE 9.1 Professional (32 bit) Linux 2.6.4-52-default |
Compiler: | linux:~ # gcc -v Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.3/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada --disable-checking --libdir=/usr/lib --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit i586-suse-linux Thread model: posix gcc version 3.3.3 (SuSE Linux) |
Libraries: | linux:~ # /lib/libc.so.6
GNU C Library stable release version 2.3.3 (20040405), by Roland McGrath et al.
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Configured for i686-suse-linux.
Compiled by GNU CC version 3.3.3 (SuSE Linux).
Compiled on a Linux 2.6.4 system on 2004-04-05.
Available extensions:
GNU libio by Per Bothner
crypt add-on version 2.1 by Michael Glad and others
linuxthreads-0.10 by Xavier Leroy
GNU Libidn by Simon Josefsson
NoVersion patch for broken glibc 2.0 binaries
BIND-8.2.3-T5B
libthread_db work sponsored by Alpha Processor Inc
NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
Report bugs using the `glibcbug' script to |
Even though we are using 1GB of memory in a dual channel configuration, the Socket 754 platform will only perform in single channel mode. Fortunately for AMD, since the memory controller is directly on the processor we do not see large latencies going from dual channel to single channel mode. Only the Athlon 64 2800+ can run 64-bit binaries, so for the sake of experiment we will only look at 32-bit binaries today. We have looked at 32-bit versus 64-bit performance in the past, and we will revisit it again in a few weeks, so today we will just focus on 32-bit performance.
Also keep in mind the GCC 3.3.3 included with SuSE 9.1 Pro has many back ported options from the official 3.4.1 tree. Our results with GCC 3.3.3 are much more optimized than the standard GCC 3.3.3.
59 Comments
View All Comments
Matthew Daws - Friday, August 20, 2004 - link
Kris: Oh, okay, sorry, yes, that makes sense. Have you tried the Windows executable yet? I've verified that with TSCP 1.7.3 I'm getting reasonable results, so it seems likely that my results with v1.8.1 are not too far off base...--Matt
KristopherKubicki - Friday, August 20, 2004 - link
Rys:Correct, but that doesnt mean it still does not lack 64-bit addressing ;) And in reality, how critical would any desktop CPU today need to address more than 48bits ? Isnt that something like 1TB?
Kristopher
KristopherKubicki - Friday, August 20, 2004 - link
Matthew: I read this:>You shouldn't see any difference with linux:
>indeed, only a linux box I have access to, with GCC
>3.2.2 (I *think* it's a P4 2.8GHz, but I'm not 100%
>sure: I'm doing a remote-login right now, so cannot
>check!) I get 365K with (-O3 -march=pentium4).
So your 2.8C is getting the same marks as my 3.6 nocona -- is what i meant.
Kristopher
Rys - Friday, August 20, 2004 - link
You repeatedly mention the Sempron's 'lack of 64-bit addressing'. None of the CPU's on test, including the Athlon 64, can address a 64-bit memory space. All current AMD64 implementations can only address a 40-bit physical address space and a 48-bit virtual address space.Rys
Matthew Daws - Friday, August 20, 2004 - link
Kris,I am confused: in the link you gave me, the Xeon is getting circa 350K, which is way better than I getting, as expected... Okay, so it's low clock for clock, but you said: "Youre getting higher numbers than i got with my Xeon 3.6GHz chip."
--Matt
Matthew Daws - Friday, August 20, 2004 - link
Kris,I've downloaded TSCP 1.7.3 which Tom Kerrigan has collected a lot of benchmarking data about. It also gives a MIPS rating: I get 2136 MIPS (with GCC -O3 -march=pentium4) and 2174 (with the included windows benchmark) These compare well with the data on his website (http://home.comcast.net/~tckerrigan/bench.html) where this puts my 2GHZ Celeron at about the same level an a P4 1800, which seems reasonable for a heavy CPU benchmark.
I suggested that on your test system you run the precompiled Windows executable which Tom gives: this should give an approximate value, as Visual C++ and GCC produce roughly the same performance of code, and with this benchmark, switching between Windows and Linux really shouldn't make a difference. You might also try the earlier code, as I have just done, and then you'll have a 3rd party (namely Tom's list) to compare against...
--Matt
Wesley Fink - Friday, August 20, 2004 - link
#32 - I would think the CPU scaling charts in Doom 3 at http://www.anandtech.com/cpuchipsets/showdoc.aspx?... would be all the proof you need to see the 3100+ is the better value. The 3100+ is 75.3FPS, the XP 3200+ is 68FPS, and the 2500+ is 55.6FPS.If DX9 game performance is not convincing, then you might refresh your memory in pages and pages of benchmarks comparing the 3100+ 754 and 2500+ Socket A in http://www.anandtech.com/cpuchipsets/showdoc.aspx?...
thornc - Friday, August 20, 2004 - link
My main problem with the article is that the Athlon XP 2500+ with Barton core was not included!I am thinking of getting a new system and I intend to use a Barton XP, but if I might change my mind if I see prove that the Sempron is a better deal.
TauCeti - Friday, August 20, 2004 - link
Hi Matthew, Kris and also hello to dougSF30 from siliconinvestor ;)Nice discussion here!
I think there is a way to _really_ understand e.g. the TSCP benchmark scores on all AMD CPUs.
AMD offers (for free) the "AMD CodeAnalyst™ Performance Analyzer for Linux 2.2" and the "AMD Simulation Utilities 2.1"
afaik it is possible with those tools to profile any target code (in this case the TSCP bench) and then even simulate the execution of that code for different CPUs down to assembler code/cache state and even deeper into CPU execution unit usage.
quote from AMD: "The data presented is at the assembly instruction level and is not intended to assist a programmer working in a high-level language. The detailed data on the execution of each instruction takes into account the previous instructions executed and the state of the processor caches. The data is obtained by running the target block of code, then using the debug capabilities of the processor to single step through each opcode to obtain an execution trace. This execution trace is then fed into a Processor Simulation that analyzes the execution."
I think getting used to this tool and knowing how to interpret the results would be very fruitful for any reviewer (and programmer). Dresdenboy at mersenneforum.org used it to dissect the low SSE2 Performance of the AMD64s in the prime95 code.
The downside is that this does not look like an easy task even for a 'normal' experienced programmer. I even don't know if such large code-blocks like TSCPs bench can be used with the tool or if it is only suitable for inner-loop optimization.
Tau
balzi - Friday, August 20, 2004 - link
So I take it by the ignorance that no one really cares if the graphs are readable -- oh well.. I'll still read Anandtech but maybe I won't enjoy it as much in the future.