Performance per Watt rules the datacenter, right? Wrong. Yes, you would easily be lead astray after the endless "Green ICT" conferences, the many power limited datacenters, and the flood of new technologies that all have the "Performance/Watt" stamp. But if performance per Watt is all that counts, we would be all be running atom and ARM based servers. Some people do promote Atom based servers, but outside niche markets we don't think it will be a huge success. Why not? Think about it: what is the ultimate goal of a datacenter? The answer is of course the same as for the enterprise as a whole: serve as many (internal or external) customers as possible with the lowest response time at the lowest cost.

So what really matters? Attaining a certain level of performance. At that point you want the lowest power consumption possible, but first you want to attain the level of performance where your customers are satisfied. So it is power efficiency at a certain performance level that you are after, not the best performance/Watt ratio. Twenty times lower power for 5 times lower performance might seem an excellent choice from the performance/watt point of view, but if your customers get frustrated with the high response times they will quit. Case closed. And customers are easily frustrated. "Would users prefer 10 search results in 0.4 seconds or 25 results in 0.9 seconds?" That is a question Google asked [1]. They found out to their surprise that a significant number of users got bored and moved on if they had to wait 0.9 seconds. Not everyone has an application like Google, but in these virtualized times we don't waste massive amounts of performance as we used to in the beginning of this century. Extra performance and RAM space is turned into more servers per physical server, or business efficiency. So it is very important not to forget how demanding we all are as customers when we are browsing and searching.

Modern CPUs have a vast array of high-tech weapons to offer good performance at the lowest power possible. PowerNow!, SpeedStep, Cache Sizing, CoolCore, Smart Fetch, PCU, Independent Dynamic Core Technology, Deep Sleep, and even Deeper Sleep. Some of those technologies have matured and offer significant power savings with negligible performance impact. A lot of them are user configurable: you can disable/enable them in the BIOS or they get activated if you chose a certain power plan in the operating system. Those that are configurable are so for a good reason: the performance hit is significant in some applications and the power savings are not always worth the performance hit. In addition, even if such technologies are active under the hood of the CPU package, it is no guarantee that the operating system makes good use of it.

How do we strike the right balance between performance and energy consumption? That is the goal of this new series of articles. But let's not get ahead of ourselves; before we can even talk about increasing power efficiency at a certain performance point, we have to understand how it all works. This first article dives deep into power management, to understand what works and what only works on PowerPoint slides. There is more to it than enabling SpeedStep in your server. For example, Intel has been very creative with Turbo Boost and Hyper-Threading lately. Both should increase performance in a very efficient way. But does the performance boost come with an acceptable power consumption increase? What is acceptable or not depends on your own priorities and applications, but we will try to give you a number of data points that can help you decide. Whether you enable some power management technologies, how you configure your OS is not the only decision you have to make as you attempt to provide more efficient servers.

 

Both AMD and Intel have been bringing out low power versions of their CPUs that trade clock speed for lower maximum power. Are they really worth the investment? A prime example of how the new generation forces you to make a lot of decisions is the Xeon L3426: a Xeon "Lynnfield" which runs at 1.86GHz and consumes 45W in the worst case according to Intel. What makes this CPU special is that it can boost its clock to 3.2GHz if you are running only a few active threads. This should lower response times when relatively few users are using your application, but what about power consumption? AMD's latest Opteron offers six cores at pretty low power consumption points, and it can lower its clock from 2.6GHz all the way down to 800MHz. That should result in significant power savings but the performance impact might be significant too. We have lots of questions, so let's start by understanding what happens under the hood, in good old AnandTech "nuts and bolts" tradition.

Warning: This article is not suited for quick consumption. Remember, you come to AnandTech for hard hitting analysis, and that's what this article aims to provide! Please take your time… there will be a quiz at the end. ;-)

How Does Power Management Work?
Comments Locked

35 Comments

View All Comments

  • JohanAnandtech - Monday, January 18, 2010 - link

    In which utility do you set/manage the frequency of a separate core?
  • n0nsense - Monday, January 18, 2010 - link

    Gnome panel applets. CPU frequency monitor I guess it uses cpufreq. Each instance monitors core. So i have 4 of them visible all the time. If you have enabled CPU Frequency scaling (kernel) than you can select the governor (performance, on demand, conservative etc) or a static frequency. I can do it for each core. And it displays what i have set.
    Of course processor should support frequency scaling.(power now and speed step).
    Most mainstream distributions (Ubuntu, Sabayon, Fedora) will use onedemand governor by default when processor with frequency scaling available. No user intervention required.
  • jordanclock - Monday, January 18, 2010 - link

    I really think you're mistaken. Core 2 CPUs don't have any mechanism to allow per-core frequencies. There is one FSB clock and one multiplier. There is no way to set CPU0 to a different frequency than CPU1 (or for quad core, CPU2 and CPU3) because the variables that control the clock speed are chip wide.
  • VJ - Tuesday, January 19, 2010 - link

    These people seem to be convinced of per-core Speedstep:

    https://bugs.launchpad.net/ubuntu/+source/linux-so...">https://bugs.launchpad.net/ubuntu/+source/linux-so...

    Maybe someone can ask David Tomaschik for the Intel documentation he refers to?
  • n0nsense - Monday, January 18, 2010 - link

    I heard it in past, but i still tend to believe my eyes :)
    while writing this reply, i saw any possible combination. My Q9300 has 2 states 2.0GHz and
    2.5GHz. It's not a server CPU. Have no reason to mislead you
  • VJ - Tuesday, January 19, 2010 - link

    If there's only two states, then it's possible that one core is in the C2 state while the other is in its C0 state.

    The core in state C2 may be shown to be operating at 2Ghz (its lowest frequency) while it's really off. The OS may simply be reporting the lowest possible frequency while the core is really not receiving a clock signal.

    So in general, if one core is showing its lowest frequency it may be off which still allows the other core to operate (at a different frequency).

    It would be very strange if both cores are operating greater than their lowest and less than their highest frequencies at different frequencies.

    From a different angle: Has anybody ever seen /proc/cpuinfo report a frequency less than the CPU/Core's lowest active frequency or even zero? Probably not.



  • n0nsense - Tuesday, January 19, 2010 - link

    Nice theory :)
    But in this case, I see that each core doing something. htop shows that each core somewhere in 15% usage. So the only options left, are
    1. Each core frequency can be controlled independently on C2D and C2Q (May be i3 i5 i7 too)
    2. The OS is completely unaware of whats going on :) (which is less possible)
  • mino - Thursday, January 21, 2010 - link

    "The OS is completely unaware of whats going on" is the right answer.
    :)

    BTW, only x86 CPU's able to change freq per core are >=K10 for AMD and >=Nehalem for Intel.
  • VJ - Tuesday, January 19, 2010 - link

    Not to defeat your argument/observations, rather for completeness' sake:

    It's also possible that the differences are due to the reading of the attributes. If the attributes are read in succession, then it's possible that the differences are due to the time of reading the attributes, while at any given instant, notwithstanding the allowable subtle differences in frequency described in this article, all cores are operating at the same frequency.

    There's a lot of time at the bottom.
  • JanR - Tuesday, January 19, 2010 - link

    Hi,

    I completely agree to this:

    "It's also possible that the differences are due to the reading of the attributes."

    The point is that desktop usage together with ondemand governor leads to a lot of fast frequency changes. Therefore, this is not a good scenario to decide on "per core" vs "per CPU". We did a lot of testing the following way:

    Put load on all cores using "taskset" (this avoids C-states). Switch to "userspace" governor and then set frequencies of individual cores differently. You have one control per core but the actual hardware decides what really happens - you can check this in /proc/cpuinfo or using a tool such as "mhz" from lmbench as load generator (this one calculates actual frequency based on CPI and time, it allows also measurement of turbo frequencies).

    Trying around, the results are:

    AMD K8: One clock domain, maximum of the requested frequencies is taken

    Intel Core2 Duo: Same as K8

    AMD K10: Individual clock domains, you can clock each core individually

    Intel Core 2 Quad: TWO clock domains! These CPUs are two dual core dies glued together so each die has its one multiplicator. Therefore, the cores of each die get the maximum of the requested frequencies but you can clock the two dies independendly.

    Intel Nehalem: One clock domain, maximum of requests of all cores that are not in C-state! If you set one core to, e.g., 2.66 GHz and all other to 1.6, all cores clock at 1.6 as long as the core set to 2.66 is not used, they all switch to 2.66 if you put load on that core.

    So far to our findings. "cat /proc/cpuinfo" or some funny tools are useless if you do not control the environment (userspace, manual settings). If you then enable ondemand, the system switches fast between different states and looking at it is just a snapshot, maybe taken in the middle of a transition.

    Greetings,

    Jan

Log in

Don't have an account? Sign up now