On Tue, 2015-05-12 at 13:23 +1000, Daniel Axtens wrote: > Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in > save_mce_event, index got the value of mce_nest_count, and > mce_nest_count was incremented *after* index was set. > > However, that patch changed the behaviour so that mce_nest count was > incremented *before* setting index. > > This causes an off-by-one error, as get_mce_event sets index as > mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads > bogus data, causing warnings like > "Machine Check Exception, Unknown event version 0 !" > and breaking MCEs handling. > > Restore the old behaviour and unbreak MCE handling by subtracting one > from the newly incremented value. > > The same broken change occured in machine_check_queue_event (which set > a queue read by machine_check_process_queued_event). Fix that too, > unbreaking printing of MCE information. > > Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") > CC: stable@xxxxxxxxxxxxxxx > CC: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx> > CC: Christoph Lameter <cl@xxxxxxxxx> > Signed-off-by: Daniel Axtens <dja@xxxxxxxxxx> > > --- > > The code is still super racy, but this at least unbreaks the common, > non-reentrant case for now until we figure out how to fix it properly. > The proper fix will likely be quite invasive so it might be worth > picking this up in stable rather than waiting for that? > > mpe: the generated asm is below > > 0000000000000070 <.save_mce_event>: > 70: e9 6d 00 30 ld r11,48(r13) > 74: 3d 22 00 00 addis r9,r2,0 > 78: 39 29 00 00 addi r9,r9,0 > 7c: 7d 2a 4b 78 mr r10,r9 > 80: 39 29 00 08 addi r9,r9,8 > 84: 7d 8a 58 2e lwzx r12,r10,r11 > 88: 39 8c 00 01 addi r12,r12,1 > 8c: 7d 8a 59 2e stwx r12,r10,r11 > 90: e9 0d 00 30 ld r8,48(r13) > 94: 7d 4a 40 2e lwzx r10,r10,r8 > 98: 39 4a ff ff addi r10,r10,-1 > 9c: 2f 8a 00 63 cmpwi cr7,r10,99 > > AIUI, we get the per-cpu area in 70, the addr of mce_nest_count itself > in 80, then load, incr, stor in 84-8c, then we get the address and > load again in 90-94, then subtract 1 to make the count sensible again, > then 9c is the conditional `if (index >= MAX_MC_EVT)' > > I think that was what you expected? Sort of. I wasn't expecting it to reload it after the increment. But I guess that's an artifact of the macros. Anyway it's much better than the current code which is just broken always. cheers -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html