On 05/03/2015 12:05 AM, Greg KH wrote: > On Tue, Apr 28, 2015 at 02:49:55PM +0530, Preeti U Murthy wrote: >> commit 345527b1edce8df719e0884500c76832a18211c3 upstream >> >> It was found when doing a hotplug stress test on POWER, that the >> machine either hit softlockups or rcu_sched stall warnings. The >> issue was traced to commit: >> >> 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") >> >> which exposed the cpu_down() race with hrtimer based broadcast mode: >> >> 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") >> >> The race is the following: >> >> Assume CPU1 is the CPU which holds the hrtimer broadcasting duty >> before it is taken down. >> >> CPU0 CPU1 >> >> cpu_down() take_cpu_down() >> disable_interrupts() >> >> cpu_die() >> >> while (CPU1 != CPU_DEAD) { >> msleep(100); >> switch_to_idle(); >> stop_cpu_timer(); >> schedule_broadcast(); >> } >> >> tick_cleanup_cpu_dead() >> take_over_broadcast() >> >> So after CPU1 disabled interrupts it cannot handle the broadcast >> hrtimer anymore, so CPU0 will be stuck forever. >> >> Fix this by explicitly taking over broadcast duty before cpu_die(). >> >> This is a temporary workaround. What we really want is a callback >> in the clockevent device which allows us to do that from the dying >> CPU by pushing the hrtimer onto a different cpu. That might involve >> an IPI and is definitely more complex than this immediate fix. >> >> Changelog was picked up from: >> >> https://lkml.org/lkml/2015/2/16/213 >> >> Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Tested-by: Nicolas Pitre <nico@xxxxxxxxxx> >> Signed-off-by: Preeti U. Murthy <preeti@xxxxxxxxxxxxxxxxxx> >> Cc: linuxppc-dev@xxxxxxxxxxxxxxxx >> Cc: mpe@xxxxxxxxxxxxxx >> Cc: nicolas.pitre@xxxxxxxxxx >> Cc: peterz@xxxxxxxxxxxxx >> Cc: rjw@xxxxxxxxxxxxx >> Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html >> Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@xxxxxxxxxxxxxxxxx >> [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ] >> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> >> --- >> >> Please apply this to 3.19 stable. > > What about 4.0 stable? It needs to be applied to 4.0 as well. I pulled stable before I posted out and did not find this branch then. > > And this doesn't look like it's the same backport, you didn't modify > tick.h, why not? This was a mistake, apologies for that. Not sure how that got missed. I have resent this patch taking care of the missing hunk with the RESEND tag, that has to be applied to both 3.19 and 4.0. Thank you Regards Preeti U Murthy > > thanks, > > greg k-h > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html