On Wed, Jun 24, 2009 at 04:03:05PM +0800, Peter Zijlstra wrote: > On Wed, 2009-06-24 at 15:47 +0800, Shaohua Li wrote: > > On Wed, Jun 24, 2009 at 02:39:18PM +0800, Peter Zijlstra wrote: > > > On Wed, 2009-06-24 at 12:13 +0800, Shaohua Li wrote: > > > > This patch supports the processor aggregator device. When OS gets one ACPI > > > > notification, the driver will idle some number of cpus. > > > > > > > > To make CPU idle, the patch will create power saving thread. Scheduler > > > > will migrate the thread to preferred CPU. The thread has max priority and > > > > has SCHED_RR policy, so it can occupy one CPU. To save power, the thread will > > > > keep calling C-state instruction. Routine power_saving_thread() is the entry > > > > of the thread. > > > > > > > > To avoid starvation, the thread will sleep 5% time for every second > > > > (current RT scheduler has threshold to avoid starvation, but if other > > > > CPUs are idle, the CPU can borrow CPU timer from other, so makes the mechanism > > > > not work here) > > > > > > > > This approach (to force CPU idle) should hasn't impact to scheduler and tasks > > > > with affinity still can get chance to run even the tasks run on idled cpu. Any > > > > comments/suggestions are welcome. > > > > > > > +static int power_saving_thread(void *data) > > > > +{ > > > > + struct sched_param param = {.sched_priority = MAX_RT_PRIO - 1}; > > > > + int do_sleep; > > > > + > > > > + /* > > > > + * we just create a RT task to do power saving. Scheduler will migrate > > > > + * the task to any CPU. > > > > + */ > > > > + sched_setscheduler(current, SCHED_RR, ¶m); > > > > + > > > > > > This is crazy and wrong. > > > > > > 1) cpusets can be so configured as to not have the full machine in a > > > single load-balance domain, eg. the above comment about the scheduler is > > > false. > > Assume user will not assign such thread to a cpuset, if yes, it's user's > > wrong. > > No its user policy, and esp on large machines cpusets are very useful. > The kernel not taking that into account is simply not an option. > > Any thermal facility that doesn't take cpusets into account, or worse > destroys user policy (the hotplug road), is a full stop in my book. > > Is similar to the saying the customer is always right, sure the admin > can indeed configure the machine so that any thermal policy is indeed > doomed to fail, and in that case I would print some warnings into syslog > and let the machine die of thermal overload -- not our problem. > > The thing is, the admin configures it in a way, and then expects it to > work like that. If any random event can void the guarantees what good > are they? > > Now, if ACPI-4.0 is so broken that it simply cannot support a sane > thermal model, then I suggest we simply not support this feature and > hope they will grow clue for 4.1 and try again next time. The assumption is user not assigns power saving thread to a specific cpuset. I thought the assumption is feasible, user can assign threads they care about to a cpuset, but not all. Power saving thread stays at the top cpuset, so it still has chance to run on any cpus. If power saving thread runs on a cpu, the tasks on the cpu still have chance to run (at least 0.05s), so it does not completely break user policy. > > > 2) you're running at MAX_RT_PRIO-1, this will mightily upset the > > > migration thread and kstopmachine bits. > > > > > > 3) you're going to starve RT processes by being of a higher priority, > > > even though you might gain enough idle time by simply moving SCHED_OTHER > > > tasks around. > > for 2/3, the power saving thread has SCHED_RR, it will run out of its time slice > > in 100ms. SCHED_OTHER might not work, because the system might be very busy. > > > > Or we can lower the priority to not upset kernel RT threads. Usually applications > > are not RT. > > Right, doing this at RR prio 1 would be much better. ok, will do this. > > > 4) you're introducing 57s latencies to processes that happen to get > > > scheduled on whatever CPU you end up on, not nice. > > Sorry for my ignorance on scheduler, I don't understand what you mean. > > Won't scheduler will migrate normal threads out the cpu? > > Not currently, no, that's one of the more interesting things on the todo > list. > > But it appears I can't read very well, its .95s, still a lot but not > quite as bad as I made it out. good. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html