On Thu, May 21, 2009 at 01:36:35AM +0800, Vaidyanathan Srinivasan wrote: > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2009-05-20 15:41:55]: > > > On Wed, 2009-05-20 at 15:13 +0200, Andi Kleen wrote: > > > Thanks for the explanation. > > > > > > My naive reaction would be to fail if the socket to be taken out > > > is the only member of some cpuset. Or maybe break affinities in this case. > > > > Right, breaking affinities would go against the policy of the admin, I'm > > not sure we'd want to go there. We could start generating msgs about how > > we're in thermal trouble and the given configuration is obstructing > > counter measures etc.. > > > > Currently hot-unplug does break affinities, but that's an explicit > > action by the admin himself, so he gets what he asks for (and we do > > generate complaints in syslog about it). > > > > [ Same scenario for the HPC guys who affinity fix all their threads to > > specific cpus, there's really nothing you can do there. Then again > > such folks generally run their machines at 100% so they'd better > > be able to deal with their thermal peak capacity anyway. ] > > > > > > You really want to start shrinking the generic computational capacity > > > > first. > > > > > > One general issue to remember that if you don't react to the platform hint > > > the platform will likely force a lower p-state on you to not exceed > > > the thermal limits, making everyone slower. > > > > > > (this will likely also not make your real time process happy) > > > > Quite. > > > > > So it's a bit more than a hint; it's more like a command "or else" > > > > > > So it's a good idea to react or at least make at least a reasonable attempt > > > to react. > > > > Sure, does the thing give more than a: 'react now, or else' impulse? > > That is, can we see it coming, or will we have to deal with it when > > we're there? > > > > The latter also has the problem that you have to react very quickly. > > > > > > The thing is, you cannot simply rip cpus out from under a system, people > > > > might rely on them being there and have policy attached to them -- esp. > > > > people touching cpusets should know that a machine isn't configured > > > > homogeneous and any odd cpu will do. > > > > > > Ok, so do you think it's possible to figure out based on the cpuset > > > graph / real time runqueue if a socket can be taken out? > > > > Right, so all of this depends on a number of things, how frequent and > > how fast would these situations occur? > > > > I would think they'd be rare events, otherwise you really messed up your > > infrastructure. I also think reaction times should be in the seconds, > > otherwise you're cutting it way to close. > > > > > > The work IBM has been doing is centered around overloading neighbouring > > packages in order to keep some idle. The overload is exposed as a > > percentage. > > > > This works within scheduling domains, so if you carve your machine up in > > tiny (<= 1 package) domains its impossible to do anything (corner case, > > we could send cries for help syslog's way). > > > > I was hoping we could control the situation with that. But for that to > > work we need some gradual information in order to make that > > thermal<->overload feedback work. > > The advantages of this method is to reduce load on one package and not > target a particular CPU. This is less restrictive and can allow the > load balancer to work out the details. Keeping a core idle on an > average (over a time interval) is good enough to reduce the power and > heat. > > Here we need not touch the RT jobs or break use space policies. We > effectively reduce capacity and let the loadbalancer have the > flexibility of figuring out which CPU should not be scheduled now. > > That said, this is not useful for a 'cpu cache error' case, in which > case you will have to cpu-hot-unplug anyway. You don't want any > interrupts/timers to land there in an unreliable CPU. > > Overloading the powersave load balancer to assume reduced capacity on > some of the packages while overloading some others packages is the > core idea. The RFC patches still need a lot of work to meet the > required functionality. So the main concern is breaking user policy, but it appears any approach (cpu hotplug/cpuset) will break user policy (affinity). I wonder how the scheduler approach can overcome this to my little scheduler knowledge. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html