Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

Jiri Slaby <jslaby@xxxxxxx> · Fri, 26 Feb 2016 09:56:17 +0100

On 02/26/2016, 01:38 AM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@xxxxxxx> wrote:
>>
>> Interestingly, RBP contains address inside try_to_wake_up --
>> ffffffff810a535a (dunno why) which is:
>> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
>> <ttwu_stat>
>> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
>> <try_to_wake_up+0x3c>
>>
>> ttwu_stat does in the begginning:
>> mov    $0x16e80,%r14
>>
>> which is what we actually still have in r14 when it crashes. The first
>> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
>> be overwritten).
> 
> Hmm. That does sound very much like it might be ttwu_stat() that has
> gotten the stack frame wrong, and when finishes exits, it does
> 
>         popq    %rbp
>         ret
> 
> but in fact it popped the return address, and then returned to a crazy address.
> 
> Which sounds like a corrupted stack pointer (not a corrupted stack).
> 
> Can you make just the "vmlinux" file available somewhere?

Sure, both vmlinux w/ its separated .debuginfo sections vmlinux.debug
are at:
http://labs.suse.cz/jslaby/bug-968218/

There is also core.s which is a result of:
objdump -d vmlinux-4.4.2-3-default | grep -A 10000 '<update_rq_clock>:'
>core.s

> In my own private configuration, ttwu_stat() doesn't actually touch
> the stack at all - no stack pointer action anywhere except for the
> 
> ttwu_stat:
> 1:      call    __fentry__
>         pushq   %rbp
>    ..
>         movq    %rsp, %rbp      #,
> 
>  .....
> 
>         popq    %rbp
>         ret
> 
> but yeah, as Peter says, maybe an exception screwed up %rsp somehow..

Lucky you. My ttwu_stat does a bit more stack save-restoring. But all
seem to be paired:

ffffffff8109f3c0 <ttwu_stat>:
ffffffff8109f3c0:       e8 fb ca 60 00          callq  ffffffff816abec0
<__fentry__>
ffffffff8109f3c5:       55                      push   %rbp
ffffffff8109f3c6:       48 89 e5                mov    %rsp,%rbp
ffffffff8109f3c9:       41 57                   push   %r15
ffffffff8109f3cb:       41 56                   push   %r14
ffffffff8109f3cd:       41 55                   push   %r13
ffffffff8109f3cf:       41 54                   push   %r12
ffffffff8109f3d1:       49 c7 c6 80 6e 01 00    mov    $0x16e80,%r14
ffffffff8109f3d8:       53                      push   %rbx
...
ffffffff8109f48c:       5b                      pop    %rbx
ffffffff8109f48d:       41 5c                   pop    %r12
ffffffff8109f48f:       41 5d                   pop    %r13
ffffffff8109f491:       41 5e                   pop    %r14
ffffffff8109f493:       41 5f                   pop    %r15
ffffffff8109f495:       5d                      pop    %rbp
ffffffff8109f496:       c3                      retq

> I really don't see how it would happen here - that code doesn't look
> particularly odd.
> 
> And the fentry code used by the function tracer can certainly screw
> things up, but even that would be hard-pressed to screw up %rbp, since
> the saving of rbp comes *after* fentry. Old pre-__fentry__ gcc
> versions had a much higher likelihood (the whole mcount thing is a
> disaster, but I'm assuming you have a compiler that does __fentry__
> and have CC_USING_FENTRY set?)

Yep, -mfentry in use obviously from the dump above, it is compiled by
gcc 5.3.1 rev231346.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html