On 4/27/2011 12:38 AM, Borislav Petkov wrote:
Great, whatever you guys come up with, we'd like to give it a run too. We (AMD) hit the same issue in one of our tests but in our case we end up in an endless loop of the state machine at stop_machine_cpu_stop() since the core being offlined cannot ack the state transition to STOPMACHINE_EXIT due to a similar reason. One possible fix is dropping CPU_DYING from console_cpu_notify() since it is called into by the offlining path in kernel/cpu.c::take_cpu_down().
This seems to be a different problem. Could you elaborate about why removing CPU_DYING from console_cpu_notify resolves your problem? What are other possible fixes?
In the failure case I witnessed, we're attempting to sleep in atomic mode, which is a clear violation caused by the addition of CPU_DYING. I haven't thoroughly investigated whether other actions in console_cpu_notify (eg. ONLINE, DEAD, DOWN_FAILED, UP_CANCELED) are in atomic mode violation as well.
Thanks, Mike -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html