The ondemand governor does tend to go all or nothing with respect to CPU frequency. That is not entirely laziness, it has some logic to compute optimum frequency but doesn't generally use it. There is some evidence intermediate frequencies are a waste of effort. Please consider a couple of things: 1) Most Intel CPUs do most of their power savings through C-states, not by reducing clock frequency. That may have something to do with why you see modest power savings between ondemand and performance. Recent AMD CPUs, on the other hand, rely a lot more on reducing clock frequency to save power. Down the road, we'll need to be doing both effectively. But even going to the very lowest clock frequency on a Nehalem EP will not save very much power -- and increased use of intermediate frequencies will help less. That said, minimizing turbo boost usage will likely save quite a bit of power (at the expense of reduced performance). It would definitely be nice to see results on a variety of modern CPUs for a major patch like this. 2) Please consider the case where per performance really does matter when heavy loads are present, but we'd like to save power when the system is lightly loaded. This is different from the laptop case, where saving power under load is probably as important as the performance, and if you are truly idle you are turning things off altogether. Your claim of matching the performance governor's performance is a great aspiration but it'll need to be demonstrated on a variety of CPUs and workloads, this is not usually easy to accomplish. David C Niemi -----Original Message----- From: cpufreq-owner@xxxxxxxxxxxxxxx [mailto:cpufreq-owner@xxxxxxxxxxxxxxx] On Behalf Of Youquan Song Sent: Thursday, December 23, 2010 1:24 AM To: davej@xxxxxxxxxx; cpufreq@xxxxxxxxxxxxxxx Cc: venki@xxxxxxxxxx; arjan@xxxxxxxxxxxxxxx; lenb@xxxxxxxxxx; suresh.b.siddha@xxxxxxxxx; kent.liu@xxxxxxxxx; chaohong.guo@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; Youquan Song; Youquan Song Subject: [PATCH 0/6] cpufreq: Add sampling window to enhance ondemand governor power efficiency Running a well-known power performance benchmark, current ondemand governor is not power efficiency. Even when workload is at 10%~20% of full capability, the CPU will also run much of time at highest frequency. In fact, in this situation, the lowest frequency often can meet user requirement. When running this benchmark on turbo mode enable machine, I compare the result of different governors, the results of ondemand and performance governors are the closest. There is no much power saving between ondemand and performance governor. If we can ignore the little power saving, the perfomance governor even better than ondemand governor, at leaset for better performance. One potential reason for ondemand governor is not power efficiency is that ondemand governor decide the next target frequency by instant requirement during sampling interval (10ms or possible a little longer for deferrable timer in idle tickless). The instant requirement can response quickly to workload change, but it does not usually reflect workload real CPU usage requirement in a small longer time and it possibly causes frequently change between highest and lowest frequency. This patchset add a sampling window for percpu ondemand thread. Each sampling window with max 150 record items which slide every sampling interval and use to track the workload requirement during latest sampling window timeframe. The average of workload during latest sample windows will be used to decide next target frequency. The sampling window targets to be more truly reflects workload requirement of CPU usage. The sampling window size can be set by user and default max sampling window is one second. When it is set to default sampling rate, the sampling window will roll back to original behaviour. The sampling window size also can be dynamicly changed in according to current system workload busy situation. The more idle, the smaller sampling window; the more busy, the larger sampling window. It will increase the respnose speed by decrease sampling window, while it will keep CPU working at high speed when busy by increase sampling window and also avoid unefficiently dangle between highest and lowest frequency in original ondemand. We set to up_threshold to 80 and down_differential to 20, so when workload reach 80% of current frequency, it will increase to highest frequency. When workload decrease to below (up_threshold - down_differential)60% of current frequency capability, it will decrease the frequency, which ensure that CPU work above 60% of its current capability, otherwise lowest frequency will be used. The Turbo Mode (P0) will comsume much more power compare with second largest frequency (P1) and P1 frequency is often double, even more, with Pn lowest frequency; Current logic will increase sharply to highest frequency Turbo Mode when workload reach to up_threshold of current frequency capacity, even current frequency at lowest frequency. In this patchset, it will firstly evaluate P1 if it is enough to support current workload before directly enter into Turbo Mode. If P1 can meet workload requirement, it will save power compare of being Turbo Mode. On my test platform with two sockets Westmere-EP server and run the well-known power performance benchmark, when workload is low, the patched governor is power saving like powersave governor; while workload is high, the patched governor is as good as performance governor but the patched governor consume less power than performance governor. Along with other patches in this patchset, the patched governor power efficiey is improved about 10%, while the performance has no apparently decrease. Running other benchmarks in phoronix, kernel building save 5% power, while the performance without decrease. compress-7zip save power 2%, while the performance also does not apparently decrease. However, apache benchmark saves power but its performance decrease a lot. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html