The Nehalem Preview: Intel Does It Again
by Anand Lal Shimpi on June 5, 2008 12:05 AM EST- Posted in
- CPUs
Not One Nehalem, but Two
Nehalem itself is very stable but it has only been in Taiwanese motherboard manufacturer hands for a relatively short while now, so the only truly mature motherboards are made by Intel. Unfortunately since Intel didn't sanction our little Nehalem excursion, we were left with little more than access to some early X58 based motherboards in Taiwan. Thankfully we had two setups to play with, each for a very limited time.
We had access to a 2.66GHz Nehalem for the longest time, unfortunately the motherboard it was paired with had some serious issues with memory performance. Not only was there no difference between single and triple channel memory configurations, memory latency was high. We know this was a board specific issue since our second Nehalem platform didn't exhibit any issues. Unfortunately we didn't have access to the more mature platform for very long at all, meaning the majority of our tests had to be run on the first setup (never fear, Nehalem is fast enough that it didn't end up mattering).
The second issue we ran into was a PCI Express problem that kept us from running any meaningful GPU benchmarks. We've been told that it'll take the motherboard guys about a month to work out these kinks, but that's why you shouldn't expect to see a full performance evaluation of Nehalem in the near term.
The CPUs are quite mature and are running extremely cool (surprisingly cool actually), their clock speeds are being artificially limited by Intel in order to avoid putting all cards on the table at this time. We saw a similar approach with the very first Penryn samples which were all locked at 2.66GHz. The Intel X58 chipset we used in our testing on the other hand got quite hot.
Nehalem no longer has a conventional FSB, its clock speed is derived from a multiplier of an external clock frequency - in this case 133MHz. Expect all Nehalem chips to come out in frequencies that are multiples of 133MHz.
Thankfully we don't want a thorough look at Nehalem today, we'll save that for the launch - what we do want is to whet our appetite. We want to know if Intel can pull it off a second time.
108 Comments
View All Comments
Anand Lal Shimpi - Thursday, June 5, 2008 - link
I really wish I could've turned off hyperthreading :)DivX doesn't scale well beyond 4 threads so that's the best benchmark I could run to look at how Nehalem performs when you keep clock speeds and number of threads capped. With a 28% improvement that's at the upper end of what we should expect from Nehalem on average.
Take care,
Anand
SiliconDoc - Monday, July 28, 2008 - link
Great answer, expalins it to a tee....However that leaves myself and I'd bet most of the fans here with not much real world use for 4x4HT ...
I don't know should we all steal rented DVD's... by re-encoding - only use I know of that might work for the non-connected enduser.
Not like "folding" is all the rage, they would have to pay me to do their work - especially with all the "power savings" hullaballoo going on in tech.
That's great, 28% increase, ok...
So I want it in a 2 core or a single core HT... since that runs everything I do outside the University.
lol
I guess the all core useage all the time, will hit sometime....
Calin - Thursday, June 5, 2008 - link
First of all, the Hyperthreading in the Pentium 4 line brought at most a 20% or so performance advantage, with a -5% or so at worst. I don't have many reasons just now to think this new Hyperthreading would be vastly different.As for the scaling to 8 cores, maybe the scaling was limited due to other issues (latency, interprocessor communication, cache coherency)? It might be possible that DivX on this new platform to increase performance from 4 to 8 cores?
bcronce - Thursday, June 5, 2008 - link
Intel claims that the new HT is improved and gives 10%-100% increase. The main issue with the P4 is that it had a double pumped alu that could process 2 integers per clock. This was great for HT since you could do 2 instructions per clock. The problem came with competition for the FPU, which there was only 1. This would cause the 2nd thread in the logical cpu to stall and thread swapping has additional overhead.You also run into the issue of L2 cashe thrashing. If you have 2 threads trying to monopolize the FPU and also loading large datasets into cache, you're cache misses go up while each thread is bottlenecked at the fpu.
techkyle - Thursday, June 5, 2008 - link
I'd like to know what AMD fans are thinking. As one myself, I'm starting to wonder if I'm going to give in and become an Intel fan.Intel implementing the IMC:
I can only say two things. One, it's about time. Two, THIEF!
Return of Hyper Threading:
It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual?
Another nail in the coffin:
AMD can provide no competition to high end Core 2 Quad machines. Even if the K10 line can jump up to par performance with the Core architecture, can they really expect to have a Nehalem competitor ready any time remotely close to Intel's launch? AMD can't afford to keep playing the power efficient and price/performance game.
AMD is going to be in an even worse position than when it was X2 vs Core 2 if they can't pull something out of their sleeves. Barcelona already isn't clock for clock competitive with the Penryn and now we hear that early Nehalems are 20-40% above Core 2?
If AMD's next processor flops, is it possible for them to drop desktop and server processors and still be a functioning (not to forget profitable) company? It's no longer a race for the performance crown. It's becoming a race to simply survive.
bcronce - Thursday, June 5, 2008 - link
"Return of Hyper Threading:It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual? "
The way Windows lists cpu is first physicals, then logicals. So in task manager the first 4, on a quad core w/ HT, will be your physical cpu and the last 4 will be you logical.
Windows, by default, will put threads on cpu 1-4 first. It will move threads around to different CPUs if it feels that one is under-taxed and another is way over-taxed.
Programmers can also force Windows to use differnt cores for each thread. So, a program can tell Windows to lock all threads to the first 4 cpus, which will keep them off of the logical. You could then allow a thread that manages the worker threads to run on the logical cpus. You would then be keeping all your hard data-crunching threads from competing with themselves and let the UI/etc threads take advantage of HT.
Spoelie - Thursday, June 5, 2008 - link
Supposedly, Shanghai (the 45nm iteration of Phenom) will be around 20% faster clock per clock over Phenom. This is what AMD said itself some time ago and not verified by an independent source. Judging by current benchmarks, this would put Shanghai at the same or slightly higher performance level of Penryn.As such, a very crude estimate is that Shanghai should be as competitive to Nehalem as K8 is to Conroe. Not a very rosy outlook so let's hope this early information is not accurate and AMD can pull something more out of its hat.
BTW, last I heard is that Bulldozer will come at the 32nm node at the earliest, since the design is supposedly too complex for 45nm. So no instant relieve from that corner. AMD will be fighting a harsh battle the coming years.
Calin - Thursday, June 5, 2008 - link
"Two, THIEF"AMD's vector processing is 3DNow!, if I remember correctly. Yet, the Intel's versions of it is are touted on its processors instead (SSE2, SSE3). Now who's the thief?
swaaye - Thursday, June 5, 2008 - link
MMX? Or, MDMX? Who copied who? Nobody, really. SIMD has been around forever.Retratserif - Thursday, June 5, 2008 - link
I really would not try and think of it as a Fan base. A majority of the OC'er and Benchmarkers use what works for what they are doing. I have owned and water cooled AMD CPU's. It was great at the time.Once Conroes came out the door swung the other way. Technology is like the ocean, it comes in waves. All waves die out. Fortunately we as the user are living it up because of Intel's success. At the same time I truly hope that AMD/ATI does something in response to the high power cpu's. If not we get what ever intel wants to give us.
There will always be to sides to each story. Since Intel is on top and unscathed, they have time to perfect chips before they go mainstream. Same way we have seen the delay in Yorkfields. There was something seriously wrong, and they had time to address it before it was in the hands of thousands of users.
Ok, I can say I am a fan.... of what ever works the best for what I do. Price/Performance/Practicallity. You take what you can afford and make it work harder for every penny you put in it.
One thing you have to keep in mind. AMD is selling more budget CPU's and integrated/onboard video PC's to large companies like Dell and HP. They are moving more aggressively into typical home PC and mobile use. Intel just does not do very well there atm. With Ati in the pocket and being pretty green on power consumption, you can get a good mobile AMD that will do everything a typical PC user will ever need for 2 years at a good price.