Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Thu, 25 Feb 2016 13:43:10 -0800

On 02/25/2016 12:51 PM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 12:32 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>>> But yes, the call trace looks accurate and makes sense, we haveL
>>>
>>>   tty_flip_buffer_push ->
>>>     (queue_work is inline) ->
>>>     queue_work_on ->
>>>       __queue_work ->
>>>         insert_work ->
>>>           (wake_up_worker is inlined)
>>>           wake_up_process ->
>>
>>               try_to_wake_up ->
>>
>>>             *insane non-code address*
> 
> The thing is, we don't actually have that try_to_wake_up() on the
> stack in the oops report.

I know, but last execution prior to things going sideways
was definitely in try_to_wake_up().

> There are other thigns on the stack, but the
> first stack entry that is dumped that is a text address is that
> "ffffffff810a5585" which is wake_up_process.
> 
> That's why I said it might be stack corruption: we might be returning
> from try_to_wake_up(), but with a corrupt stack entry, and returning
> to garbage.
> 
> If it was one of the calls _in_ try_to_wake_up() that called to insane
> code, I would have expected to see try_to_wake_up on the stack.

Agreed, how execution got from try_to_wake_up() to mysterious
percpu address without call is the question.

> That's particularly true on modern machines, where things like the
> percpu area is nopefully marked NX, so that we shouldn't be executing
> random instructions. Which is the fault that actually triggers
> ("kernel tried to execute NX-protected page"), so the "we corrupted
> the stack by running random code at the original target of the jump"
> scenario sounds much less likely.
> 
> So the whole oops looks odd. If it really was one of the calls from
> try_to_wake_up(), why isn't that return address on the stack?

I don't think it's anything from code flow.

> Since this is under qemu, I'm wondering if this is a qemu bug, where
> the NX fault processing of a call instruction happens before the stack
> is pushed, but when the instruction pointer already points to the new
> address.

Or any fault processing really; an iret to the bogus address
would then trigger NX fault without leaving a trace of the broken
exception handling.

> Another alternative *might* be that gcc has turned an indirect
> tail-call call into a "jmp *", but I certainly don't see that when I
> compile the file myself. I've seen it in the past in some (very
> unusual) cases, so it's possible - gcc definitely knows about
> tail-call jmp conversion (even if it makes debugging sometimes a
> pain).
> 
> Jiri, can you check your try_to_wake_up() disassembly for some
> indirect "jmp" instructions?
> 
>                         Linus
> 

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html