On 02/26/2016 10:05 AM, Linus Torvalds wrote: > On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: >> >> So more analysis would seem to confirm that RSP has been bumped +8 >> while in ttwu_stat() so when the epilog executed, register restore >> was off by 1 qword. However, there's nothing in ttwu_stat() that >> results in stack pointer offset by +1 qword from prolog. > > I agree. > > That's why I'm actually starting to suspect that it's an AMD microcode > bug that we know very little about. There's apparently register > corruption (the guess being from NMI handling, but virtualization was > also involved) under some circumstances. Yep, that could explain it. > Of course, if Jiri isn't actually running this on an AMD CPU, that > theory flies right out the window. I'll wait for Jiri to confirm before sinking more time here. > But we do have a reported oops on > the security list that looks totally different in the big picture, but > shares the exact same "corrupted stack pointer register state > resulting in crazy instruction pointer, resulting in NX fault" > behavior in the end. > > In the other case, microcode patchlevel 0x0600081c was fine, and > 0x06000832 is the one exhibiting the corruption problem. > > I've contacted Robert Święcki (who found the microcode problem) in > case he wants to weigh in in this thread.. He was talking to some AMD > people, but I don't know the exactly who. Ok, thanks for the info. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html