Hi, On Tue, Jul 28, 2009 at 05:00:35PM +0800, Zhang, Yanmin wrote: > I tried different clocksources. For exmaple, I could get a better (30%) result with > hpet. With hpet, cpu utilization is about 5~8%. Function hpet_read uses too much cpu > time. With tsc, cpu utilization is about 2~3%. I think more cpu utilization causes fewer > C state transitions. > > With idle=poll, the result is about 10% better than the one of hpet. If using idle=poll, > I didn't find result difference among different clocksources. IOW, this seems to clearly point to ACPI Cx causing it. Both Corrado and me have been thinking that one should try skipping all bigger-latency ACPI Cx states whenever there's an ongoing I/O request where an immediate reply interrupt is expected. I've been investigating this a bit, and interesting parts would perhaps include . kernel/pm_qos_params.c . drivers/cpuidle/governors/menu.c (which acts on the ACPI _cx state structs as configured by drivers/acpi/processor_idle.c) . and e.g. the wait_for_completion_timeout() part in drivers/ata/libata-core.c (or other sources in case of other disk I/O mechanisms) One way to do some quick (and dirty!!) testing would be to set a flag before calling wait_for_completion_timeout() and testing for this flag in drivers/cpuidle/governors/menu.c and then skip deeper Cx states conditionally. As a very quick test, I tried a while :; do :; done loop in shell and renicing shell to 19 (to keep my CPU out of ACPI idle), but bonnie -s 100 results initially looked promising yet turned out to be inconsistent. The real way to test this would be idle=poll. My test system was Athlon XP with /proc/acpi/processor/CPU0/power latencies of 000 and 100 (the maximum allowed value, BTW) for C1/C2. If the wait_for_completion_timeout() flag testing turns out to help, then one might intend to use the pm_qos infrastructure to indicate these conditions, however it might be too bloated for such a purpose, a relatively simple (read: fast) boolean flag mechanism could be better. Plus one could then create a helper function which figures out a "pretty fast" Cx state (independent of specific latency times!). But when introducing this mechanism, take care to not ignore the requirements defined by pm_qos settings! Oh, and about the places which submit I/O requests where one would have to flag this: are they in any way correlated with the scheduler I/O wait value? Would the I/O wait mechanism be a place to more easily and centrally indicate that we're waiting for a request to come back in "very soon"? OTOH I/O requests may have vastly differing delay expectations, thus specifically only short-term expected I/O replies should be flagged, otherwise we're wasting lots of ACPI deep idle opportunities. Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html