On 02/25/2016 12:51 PM, Linus Torvalds wrote: > On Thu, Feb 25, 2016 at 12:32 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: >>> But yes, the call trace looks accurate and makes sense, we haveL >>> >>> tty_flip_buffer_push -> >>> (queue_work is inline) -> >>> queue_work_on -> >>> __queue_work -> >>> insert_work -> >>> (wake_up_worker is inlined) >>> wake_up_process -> >> >> try_to_wake_up -> >> >>> *insane non-code address* > > The thing is, we don't actually have that try_to_wake_up() on the > stack in the oops report. I know, but last execution prior to things going sideways was definitely in try_to_wake_up(). > There are other thigns on the stack, but the > first stack entry that is dumped that is a text address is that > "ffffffff810a5585" which is wake_up_process. > > That's why I said it might be stack corruption: we might be returning > from try_to_wake_up(), but with a corrupt stack entry, and returning > to garbage. > > If it was one of the calls _in_ try_to_wake_up() that called to insane > code, I would have expected to see try_to_wake_up on the stack. Agreed, how execution got from try_to_wake_up() to mysterious percpu address without call is the question. > That's particularly true on modern machines, where things like the > percpu area is nopefully marked NX, so that we shouldn't be executing > random instructions. Which is the fault that actually triggers > ("kernel tried to execute NX-protected page"), so the "we corrupted > the stack by running random code at the original target of the jump" > scenario sounds much less likely. > > So the whole oops looks odd. If it really was one of the calls from > try_to_wake_up(), why isn't that return address on the stack? I don't think it's anything from code flow. > Since this is under qemu, I'm wondering if this is a qemu bug, where > the NX fault processing of a call instruction happens before the stack > is pushed, but when the instruction pointer already points to the new > address. Or any fault processing really; an iret to the bogus address would then trigger NX fault without leaving a trace of the broken exception handling. > Another alternative *might* be that gcc has turned an indirect > tail-call call into a "jmp *", but I certainly don't see that when I > compile the file myself. I've seen it in the past in some (very > unusual) cases, so it's possible - gcc definitely knows about > tail-call jmp conversion (even if it makes debugging sometimes a > pain). > > Jiri, can you check your try_to_wake_up() disassembly for some > indirect "jmp" instructions? > > Linus > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html