Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Dual Core and Hyper Threading: Detriment or Not?
A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance. To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance.
For multithreaded application performance, we can now turn to a number of benchmarks. We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):
Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler. The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):
Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications. The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads.
Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:
Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data. Quake 4, on the other hand, shows no difference in performance with SMP on or off.
From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor. The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all).
The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor? To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time. We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not.
We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled. Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more.
We tested four different scenarios:
As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled. However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads. But to get the full picture, we have to measure one last data point: Splinter Cell performance.
In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell. The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system. However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:
The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.
While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications. The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system.
We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor. The usable limit, even for today's applications and usage models, is far from just two threads.
A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance. To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance.
For multithreaded application performance, we can now turn to a number of benchmarks. We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):
3dsmax 7 | Composite Score | 3dsmax 5 rays | CBALLS2 | SinglePipe2 | UnderWater |
HT Enabled | 3.0 | 12.922s | 17.297s | 83.515s | 119.641s |
HT Disabled | 2.51 | 14.937s | 21.141s | 102.734s | 141.641s |
Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler. The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):
Media Encoding | DVD Shrink | WME9 | H.264 | iTunes |
HT Enabled | 7.1m | 46.5fps | 9.96m | 38s |
HT Disabled | 8.0m | 38.6fps | 8.53m | 40s |
Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications. The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads.
Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:
Gaming | Call of Duty 2 | Quake 4 |
HT Enabled | 68.4 | 142.3 |
HT Disabled | 69.3 | 142.3 |
Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data. Quake 4, on the other hand, shows no difference in performance with SMP on or off.
From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor. The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all).
The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor? To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time. We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not.
We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled. Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more.
We tested four different scenarios:
- A virus scan + MP3 encode
- The first scenario + a Windows Media encode
- The second scenario + unzipping files, and
- The third scenario + our Splinter Cell: CT benchmark.
AMD Athlon 64 X2 4800+ | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 22.9s | 13.8s | 36.7s | ||
AVG + LAME + WME | 35.5s | 24.9s | 29.5s | 90.0s | |
AVG + LAME + WME + ZIP | 41.6s | 38.2s | 40.9s | 56.6s | 177.3s |
AVG + LAME + WME + ZIP + SCCT | 42.8s | 42.2s | 46.6s | 65.9s | 197.5s |
Intel Pentium EE 955 (no HT) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 24.8s | 13.7s | 38.5s | ||
AVG + LAME + WME | 39.2s | 22.5s | 32.0s | 93.7s | |
AVG + LAME + WME + ZIP | 47.1s | 37.3s | 45.0s | 62.0s | 191.4s |
AVG + LAME + WME + ZIP + SCCT | 40.3s | 47.7s | 58.6s | 83.3s | 229.9s |
Intel Pentium EE 955 (HT Enabled) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 25.0s | 13.3s | 38.3s | ||
AVG + LAME + WME | 34.4s | 21.6s | 30.2s | 86.2s | |
AVG + LAME + WME + ZIP | 41.5s | 28.1s | 37.7s | 54.2s | 161.5s |
AVG + LAME + WME + ZIP + SCCT | 51.4s | 33.0s | 45.3s | 71.1s | 200.8s |
As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled. However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads. But to get the full picture, we have to measure one last data point: Splinter Cell performance.
In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell. The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system. However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:
Splinter Cell: CT | Average | Min | Max |
Intel Pentium EE 955 (no HT) | 71.0 fps | 27.8 fps | 128.1 fps |
Intel Pentium EE 955 (HT enabled) | 77.2 fps | 32.5 fps | 139.6 fps |
AMD Athlon 64 X2 4800+ | 66.9 fps | 10.5 fps | 185.0 fps |
The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.
While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications. The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system.
We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor. The usable limit, even for today's applications and usage models, is far from just two threads.
84 Comments
View All Comments
yacoub - Tuesday, January 3, 2006 - link
Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?
jjunk - Tuesday, January 3, 2006 - link
It's right there in the chart. As for further discussion not really necessary. Screaming frame rates might look good on the chart but they don't help game play. A 10 fps min will definately be noticiable.
IntelUser2000 - Sunday, January 1, 2006 - link
I don't like that paragraph. It makes it sound like 65nm will be all that makes Presler in power consumption. It will also make people judge 65nm based on Presler, since that's the first CPU on the 65nm.
In fact its not that simple. Taking a CPU that's on a certain process like the Smithfield and putting on a smaller process won't mean instant 40-50% decrease in power consumption. That's called the dumb shrink. The reason Northwood had significantly lower power than Willamette was because Northwood was optimized to lower power consumption.
A CPU that runs well at 130nm may do bad at 90nm and even worse at 65nm for example. Presler was said to be not Intel's main focus and Intel moved their design teams to Conroe, so people who's supposed to be optimizing Presler for 65nm all went away and Presler was just done a dumb shrink.
Sleep transistor was an optional feature on 65nm, not required. So Presler may not have it. '
IntelUser2000 - Monday, January 2, 2006 - link
Why use DDR2-667 with 5-5-5-15 timings?? Most DDR2-667 can do 4-4-4-8(around there). This is gonna skew the results in AMD's favor as DDR400 used is the lowest latency possible.In reality nobody is gonna use DDR400 at 2-2-2-7 lateny or DDR2-667 at 4-4-4-8 latency. Nobody I have ever heard in outside internet uses the RAM at those timings.
Anandtech should either benchmark them all at JEDEC timings or use them all with low latency. I understand they want to be sure the new test system to work properly, but using low latency RAM for the comparison system is just not fair.
JEDEC timings for DDR400 is 3-3-3-8. Where are your DDR400 advantage over DDR2 now??
hans007 - Sunday, January 1, 2006 - link
i think that the 9xx series is a big improvement over the 8xx.i have an 8xx myself the 820 which is the lowest power. the leakage is exponential so the 955 is going to draw a much highe ramount than say a 920 will.
i bet the 920 will be a half decent cpu drawing maybe only 70 watts. which isnt TOO terrible in the grand scheme of power. the 920 would only run at 2.8 ghz and have not as high leakage percentage so i think it will be the one to get.
true intel is not better yet, but they are getting there. and their dual cores still cost less.
i also think that intel should be commended for writing the smp code for q4. that is the doom3 engine which will go into a LOT of games. and since it speeds up the amd chips as well, it is a free upgrade for everyone. sure it makes up for a large deficiency in the intel chips, but it is FREE.
and it makes the really cheap 920/820 chips very price competitive. as the 820 chips are very very cheap about $150 on ebay (which is probably near what oems get them for in bulk, this the rampant dell 820 deals going on)
jjmcwill - Saturday, December 31, 2005 - link
I do professional software development for a living, using Visual Studio 2003 to build the code for a product I work on. We have over 1000 .cpp files and over 1500 header files.On my work box: An HP xw6200 workstation with a single 3.0GHz Xeon CPU, 2MB L2 cache, 1G RAM, compilation takes 10:45 for a single project in our solution. On my home system: Socket 754 Athlon 64 3000+, 1.5G RAM, compilation takes 7:30. Both systems build the code off of the exact same, external ide hard drive in a Firewire enclosure. I use it to carry all my work back and forth between work and home.
At some point we'll be investigating Make to launch parallel compiles, and I would be VERY interested in seeing dual-core CPU comparisons which include compilation benchmarks, using Visual Studio 2003 under Windows, using Make -j2 or Make -j3 under windows, and using gcc/make under Linux.
Based on what I've seen with the Xeon, I'm leaning toward an AMD X2 or dual core Opteron for my next upgrade.
Thanks.
Calin - Tuesday, January 3, 2006 - link
I think that an Extreme Edition CPU (while much more expensive) would give better results with hyperthreading enabled than a simple Pentium D and maybe even than an Athlon64 X2 while doing several threads of compile.Brian23 - Saturday, December 31, 2005 - link
The second valuable post in this thread.I own a X2 3800 and I'm pleased with the results anand posted. I won't need to upgrade for a while.
I'm looking forward to AMD implementing something similar to Sun's design: multiple threads running simultaneously. It shouldn't be that hard to do. It's just adding GPRs and a little logic that controls the thread contexts.
Missing Ghost - Saturday, December 31, 2005 - link
Some other web sites report that the cpu becomes too hot with the stock heatsink.Gary Key - Saturday, December 31, 2005 - link
The initial press release kits that contained the Intel D975XBX motherboard had an issue that created higher than normal idle/load temperatures. We have new boards on the way from Intel. I can promise you that the first results shown in other 955EE reviews do not occur on the 975x boards from Gigabyte and Asus, nor will it occur on the production release Intel D975XBX. I highly recommend a different air cooling system than the stock heatsink but most of the reported results at this time are incorrect.