Re: [PATCH] arm64: fix backtraces of KASAN kernel dumpfile truncated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022/12/08 11:52, dinghui wrote:
> Hi Kazu,
> 
> On 2022/12/5 9:05, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> On 2022/12/02 17:31, dinghui wrote:
>>> On 2022/12/2 15:44, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>>> On 2022/12/01 16:01, Ding Hui wrote:
>>>>> We met "bt" cmd on KASAN kernel vmcore display truncated backtraces
>>>>> like this:
>>>>>
>>>>> crash> bt
>>>>> PID: 4131   TASK: ffff8001521df000  CPU: 3   COMMAND: "bash"
>>>>>     #0 [ffff2000224b0cb0] machine_kexec_prepare at ffff2000200bff4c
>>>>>
>>>>> After digging the root cause, it turns out that arm64_in_kdump_text()
>>>>> found wrong bt->bptr at "machine_kexec" branch.
>>>>>
>>>>> Disassemble machine_kexec() of KASAN vmlinux (gcc 7.3.0):
>>>>>
>>>>> crash> dis -x machine_kexec
>>>>> 0xffff2000200bff50 <machine_kexec>:     stp     x29, x30, [sp,#-208]!
>>>>> 0xffff2000200bff54 <machine_kexec+0x4>: mov     x29, sp
>>>>> 0xffff2000200bff58 <machine_kexec+0x8>: stp     x19, x20, [sp,#16]
>>>>> 0xffff2000200bff5c <machine_kexec+0xc>: str     x24, [sp,#56]
>>>>> 0xffff2000200bff60 <machine_kexec+0x10>:        str     x26, [sp,#72]
>>>>> 0xffff2000200bff64 <machine_kexec+0x14>:        mov     x2, #0x8ab3
>>>>> 0xffff2000200bff68 <machine_kexec+0x18>:        add     x1, x29, #0x70
>>>>> 0xffff2000200bff6c <machine_kexec+0x1c>:        lsr     x1, x1, #3
>>>>> 0xffff2000200bff70 <machine_kexec+0x20>:        movk    x2, #0x41b5, lsl #16
>>>>> 0xffff2000200bff74 <machine_kexec+0x24>:        mov     x19, #0x200000000000
>>>>> 0xffff2000200bff78 <machine_kexec+0x28>:        adrp    x3, 0xffff2000224b0000
>>>>> 0xffff2000200bff7c <machine_kexec+0x2c>:        movk    x19, #0xdfff, lsl #48
>>>>> 0xffff2000200bff80 <machine_kexec+0x30>:        add     x3, x3, #0xcb0
>>>>> 0xffff2000200bff84 <machine_kexec+0x34>:        add     x4, x1, x19
>>>>> 0xffff2000200bff88 <machine_kexec+0x38>:        stp     x2, x3, [x29,#112]
>>>>> 0xffff2000200bff8c <machine_kexec+0x3c>:        adrp    x2, 0xffff2000200bf000 <swsusp_arch_resume+0x1e8>
>>>>> 0xffff2000200bff90 <machine_kexec+0x40>:        add     x2, x2, #0xf50
>>>>> 0xffff2000200bff94 <machine_kexec+0x44>:        str     x2, [x29,#128]
>>>>> 0xffff2000200bff98 <machine_kexec+0x48>:        mov     w2, #0xf1f1f1f1
>>>>> 0xffff2000200bff9c <machine_kexec+0x4c>:        str     w2, [x1,x19]
>>>>> 0xffff2000200bffa0 <machine_kexec+0x50>:        mov     w2, #0xf200
>>>>> 0xffff2000200bffa4 <machine_kexec+0x54>:        mov     w1, #0xf3f3f3f3
>>>>> 0xffff2000200bffa8 <machine_kexec+0x58>:        movk    w2, #0xf2f2, lsl #16
>>>>> 0xffff2000200bffac <machine_kexec+0x5c>:        stp     w2, w1, [x4,#4]
>>>>>
>>>>> We notice that:
>>>>> 1. machine_kexec() start address is 0xffff2000200bff50
>>>>> 2. the instruction at machine_kexec+0x44 store the same value
>>>>>       0xffff2000200bff50 (comes from 0xffff2000200bf000 + 0xf50)
>>>>>       into stack postion [x29,#128].
>>>>>
>>>>> When arm64_in_kdump_text() search LR from stack, it met
>>>>> 0xffff2000200bff50 firstly, so got wrong bt->bptr.
>>>>>
>>>>> We know that the real LR is always great than the start address
>>>>
>>>> Seems true.
>>>>
>>>> One question, do you see which kernel code stores that value?
>>>>
>>>
>>> Actually, there is no C code stores that value. The source code like this:
>>>
>>> void machine_kexec(struct kimage *kimage)
>>> {
>>>       phys_addr_t reboot_code_buffer_phys;
>>>       void *reboot_code_buffer;
>>>       bool in_kexec_crash = (kimage == kexec_crash_image);
>>>       bool stuck_cpus = cpus_are_stuck_in_kernel();
>>>
>>>       BUG_ON(!in_kexec_crash && (stuck_cpus || (num_online_cpus() > 1)));
>>>       WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
>>>           "Some CPUs may be stale, kdump will be unreliable.\n");
>>> ...
>>>
>>> The point is CONFIG_KASAN=y
>>>
>>> I compared the gcc args when compiling machine_kexec.o between kasan eanble [1] and kasan enable but set KASAN_SANITIZE_machine_kexec.o := n [2], the difference is:
>>>
>>> [1]: -fsanitize=kernel-address -fasan-shadow-offset=0xdfff200000000000 --param asan-globals=1   --param asan-instrumentation-with-call-threshold=10000   --param asan-stack=1
>>>
>>> [2]: -fno-builtin
>>>
>>> If I remove `--param asan-stack=1` but keep other asan args to compile machine_kexec.o, those assembly statement disappear.
>>>
>>
>> I see, thanks.
>>
>> I can see the similar pattern with CONFIG_KASAN=y also on x86_64, which
>> stores the function start address and uses 0xf1f1f1f1 (ASAN_STACK_MAGIC_LEFT
>> in gcc) and etc.
>>
>> (gdb) disas machine_kexec
>> Dump of assembler code for function machine_kexec:
>>      0xffffffff8109b1c0 <+0>:     callq  0xffffffff81099e60 <__fentry__
>> ...
>>      0xffffffff8109b208 <+72>:    movq   $0xffffffff8109b1c0,0x20(%rsp)
>>      0xffffffff8109b211 <+81>:    add    %r12,%rax
>>      0xffffffff8109b214 <+84>:    movl   $0xf1f1f1f1,(%rax)
>>
>> (gdb) disas crash_save_cpu
>> Dump of assembler code for function crash_save_cpu:
>>      0xffffffff8126e7e0 <+0>:     callq  0xffffffff81099e60 <__fentry__>
>> ...
>>      0xffffffff8126e817 <+55>:    movq   $0xffffffff8126e7e0,0x10(%rsp)
>>      0xffffffff8126e820 <+64>:    add    %rbp,%rax
>>      0xffffffff8126e823 <+67>:    movl   $0xf1f1f1f1,(%rax)
>>
>> I wondered whether excluding only their start address was enough to fix
>> the issue, but now it seems ok to me.
>>
> 
> I found some description about asan-stack at here:
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/asan.cc;h=dc7b7f4bcf1803dd2ffbbaad782cf1b515d61ed8;hb=HEAD#l156
> 
>   139    The 32 bytes of LEFT red zone at the bottom of the stack can be
>   140    decomposed as such:
> ...
>   156      3/ The following 8 bytes contain the PC of the current function which
>   157      will be used by the run-time library to print an error message.

Thanks for the info, it looks like those 24 of 32 bytes are set here and
"the PC" means the function's start address at least on arm64 and x86_64..

    0xffffffff81095d4d <+45>:    movq   $0x41b58ab3,0x10(%rsp)         1/
    0xffffffff81095d56 <+54>:    lea    0x10(%rsp),%r12
    0xffffffff81095d5b <+59>:    movq   $0xffffffff82c72298,0x18(%rsp) 2/
    0xffffffff81095d64 <+68>:    shr    $0x3,%r12
    0xffffffff81095d68 <+72>:    movq   $0xffffffff81095d20,0x20(%rsp) 3/

(gdb) p machine_kexec
$6 = {void (struct kimage *)} 0xffffffff81095d20 <machine_kexec>

Anyway, a return address should be larger than its start address, so
the patch has no ill effect for other cases if any, I think..

Thanks,
Kazu
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux