> Any thermal facility that doesn't take cpusets into account, or worse > destroys user policy (the hotplug road), is a full stop in my book. > > Is similar to the saying the customer is always right, sure the admin > can indeed configure the machine so that any thermal policy is indeed > doomed to fail, and in that case I would print some warnings into syslog > and let the machine die of thermal overload -- not our problem. > > The thing is, the admin configures it in a way, and then expects it to > work like that. If any random event can void the guarantees what good > are they? > > Now, if ACPI-4.0 is so broken that it simply cannot support a sane > thermal model, then I suggest we simply not support this feature and > hope they will grow clue for 4.1 and try again next time. Peter, ACPI is just the messenger here - user policy in in charge, and everybody agrees, user policy is always right. The policy may be a thermal cap to deal with thermal emergencies as gracefully as possible, or it may be an electrical cap to prevent a rack from approaching the limits of the provisioned electrical supply. This isn't about a brain dead administrator, doomed thermal policy, or a broken ACPI spec. This mechanism is about trying to maintain uptime in the face of thermal emergencies, and spending limited electrical provisioning dollars to match, rather than grosely exceed, maximum machine room requirements. Do you have any fundamental issues with these goals? Are we agreement that they are worth goals? The forced-idle technique is employed after the processors have all already been forced to their lowest performance P-state and the power/thermal problem has not been resolved. No, this isn't a happy scenario, we are definately impacting performance. However, we are trying to impact system performance as little as possible while saving as much energy as possible. After P-states are exhausted and the problem is not resolved, the rack (via ACPI) asks Linux to idle a processor. Linux has full freedom to choose which processor. If the condition does not get resolved, the rack will ask us to offline more processors. If this technique fails, the rack will throttle the processors down as low as 1/16th of their lowest performance P-state. Yes, that is about 100MHz on most multi GHz systems... If that fails, the entire system is powered-off. Obviously, the approach is to impact performance as little as possible while impacting energy consumption as much as possible. Use the most efficieint means first, and resort to increasingly invasive measures as necessary... I think we all agree that we must not break the administrator's cpuset policy if we are asked to force a core to be idle -- for whent the emergency is over,the system should return to normal and bear not permanent scars. The simplest thing that comes to mind is to declare a system with cpusets or binding fundamentally incompatible with forced idle, and to skip that technique and let the hardware throttle all the processor clocks with T-states. However, on aggregate, forced-idle is a more efficient way to save energy, as idle on today's processors is highly optimized. So if you can suggest how we can force processors to be idle even when cpusets and binding are present in a system, that would be great. thanks, -Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html