On Mon, Nov 06, 2017 at 09:15:14PM +0000, James Hogan wrote: > From: Matt Redfearn <matt.redfearn@xxxxxxxxxx> > > commit 9e8c399a88f0b87e41a894911475ed2a8f8dff9e upstream. > > Commit 6f542ebeaee0 ("MIPS: Fix race on setting and getting > cpu_online_mask") effectively reverted commit 8f46cca1e6c06 ("MIPS: SMP: > Fix possibility of deadlock when bringing CPUs online") and thus has > reinstated the possibility of deadlock. > > The commit was based on testing of kernel v4.4, where the CPU hotplug > core code issued a BUG() if the starting CPU is not marked online when > the boot CPU returns from __cpu_up. The commit fixes this race (in > v4.4), but re-introduces the deadlock situation. > > As noted in the commit message, upstream differs in this area. Commit > 8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu bring itself fully up") > adds a completion event in the CPU hotplug core code, making this race > impossible. However, people were unhappy with relying on the core code > to do the right thing. > > To address the issues both commits were trying to fix, add a second > completion event in the MIPS smp hotplug path. It removes the > possibility of a race, since the MIPS smp hotplug code now synchronises > both the boot and secondary CPUs before they return to the hotplug core > code. It also addresses the deadlock by ensuring that the secondary CPU > is not marked online before it's counters are synchronised. > > This fix should also be backported to fix the race condition introduced > by the backport of commit 8f46cca1e6c06 ("MIPS: SMP: Fix possibility of > deadlock when bringing CPUs online"), through really that race only > existed before commit 8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu > bring itself fully up"). > > Signed-off-by: Matt Redfearn <matt.redfearn@xxxxxxxxxx> > Fixes: 6f542ebeaee0 ("MIPS: Fix race on setting and getting cpu_online_mask") > CC: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v4.1+: 8f46cca1e6c0: "MIPS: SMP: Fix possibility of deadlock when bringing CPUs online" > Cc: <stable@xxxxxxxxxxxxxxx> # v4.1+: a00eeede507c: "MIPS: SMP: Use a completion event to signal CPU up" > Cc: <stable@xxxxxxxxxxxxxxx> # v4.1+: 6f542ebeaee0: "MIPS: Fix race on setting and getting cpu_online_mask" These did not apply to 3.18, so this patch overall did not apply there either. I don't know if you care about 3.18, but if so, can you provide backports of these for that tree, and then resend this patch so I can queue it up? thanks, greg k-h