[Crash-utility] [Question] There are differences when gdb7.6 and gdb10.2 parse the stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, crash main programmers.
I found a problem. On crash with gdb10.2, I have a vmcore that prints parts 
that shouldn't appear when parsing the process stack.
I have had some discussions with liutgnu. I recompiled and tried based on https://github.com/liutgnu/crash-preview. 
Unfortunately, it seems that the crash version based on gdb13.2 still has this problem.
Here is the output of my test:
-------------------
crash 8.0.4++
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

      KERNEL: /root/hungtask/vmlinux  [TAINTED] 
    DUMPFILE: /root/hungtask/2024_09_06_05_02_15.kernel_core  [PARTIAL DUMP]
        CPUS: 64
        DATE: Fri Sep  6 05:01:47 CST 2024
      UPTIME: 12:27:05
LOAD AVERAGE: 56.87, 25.40, 18.24
       TASKS: 4319
    NODENAME: host-047bcb37834d
     RELEASE: 4.19.90-89.11.v2401.osc.sfc.6.11.0.0070.ky10.x86_64+debug
     VERSION: #1 SMP Fri Aug 30 08:21:33 UTC 2024
     MACHINE: x86_64  (2499 Mhz)
      MEMORY: 255.9 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 112450
     COMMAND: "vtpstatd"
        TASK: ffff88816ae80000  [THREAD_INFO: ffff88816ae80000]
         CPU: 41
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 112450   TASK: ffff88816ae80000  CPU: 41   COMMAND: "vtpstatd"
 #0 [ffff889e3fa87af8] machine_kexec at ffffffff92d059ab
 #1 [ffff889e3fa87c18] __crash_kexec at ffffffff92fb9a99
 #2 [ffff889e3fa87d30] panic at ffffffff9483ed43
 #3 [ffff889e3fa87df8] watchdog_timer_fn at ffffffff93052cf6
 #4 [ffff889e3fa87e30] __hrtimer_run_queues at ffffffff92f5e96e
 #5 [ffff889e3fa87f28] hrtimer_interrupt at ffffffff92f5ffe7
 #6 [ffff889e3fa87fc8] smp_apic_timer_interrupt at ffffffff94a03176
 #7 [ffff889e3fa87ff0] apic_timer_interrupt at ffffffff94a0192f
--- <IRQ stack> ---
 #8 [ffff888263157938] apic_timer_interrupt at ffffffff94a0192f
    [exception RIP: copy_page_range+3681]
    RIP: ffffffff9331d461  RSP: ffff8882631579e8  RFLAGS: 00000246
    RAX: 1ffffd4018a95ad1  RBX: 8000003152b5a805  RCX: ffffea00c54ad688
    RDX: ffffea00c54aee88  RSI: 00007f80d117f000  RDI: ffffffff956468e0
    RBP: ffff8881c5bc2bf8   R8: fffff94018a2e22f   R9: fffff94018a2e22f
    R10: 0000000000000001  R11: fffff94018a2e22e  R12: 0000000000000018
    R13: dffffc0000000000  R14: ffffea00c54ad680  R15: 00007f80d117f000
    ORIG_RAX: ffffffffffffff13  CS: 0010  SS: 0018
 #9 [ffff888263157bc0] copy_process at ffffffff92dadcbd
#10 [ffff888263157d20] __mutex_init at ffffffff92ed8dd5
#11 [ffff888263157d38] __alloc_file at ffffffff93458397
#12 [ffff888263157d60] alloc_empty_file at ffffffff934585d2
#13 [ffff888263157da8] __alloc_fd at ffffffff934b5ead
#14 [ffff888263157e38] _do_fork at ffffffff92dae7a1
#15 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4
#16 [ffff888263157f50] entry_SYSCALL_64_after_hwframe at ffffffff94a000a4
    RIP: 00007f80ec93641a  RSP: 00007ffcb38bbd50  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffcb38bbd50  RCX: 00007f80ec93641a
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000001200011
    RBP: 00007ffcb38bbde0   R8: 000000000001b742   R9: 00007f80ee1a0f80
    R10: 00007f80ee1a1250  R11: 0000000000000246  R12: 000000000001b742
    R13: 00007ffcb38bbd70  R14: 0000000000000000  R15: 00007ffcb38bbf00
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b
-------------------
#10....#13 They seem redundant.

The following is the analysis output based on gdb7.6 and the latest crash code:
-------------------
crash_805_gdb76 8.0.5++
Copyright (C) 2002-2024  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2024  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [284MB]: patching 99408 gdb minimal_symbol values

crash_805_gdb76: gdb cannot find text block for address: dd_init_queue 
      KERNEL: vmlinux  [TAINTED]           
    DUMPFILE: 2024_09_06_05_02_15.kernel_core  [PARTIAL DUMP]
        CPUS: 64
        DATE: Fri Sep  6 05:01:47 CST 2024
      UPTIME: 12:27:05
LOAD AVERAGE: 56.87, 25.40, 18.24
       TASKS: 4319
    NODENAME: host-047bcb37834d
     RELEASE: 4.19.90-89.11.v2401.osc.sfc.6.11.0.0070.ky10.x86_64+debug
     VERSION: #1 SMP Fri Aug 30 08:21:33 UTC 2024
     MACHINE: x86_64  (2499 Mhz)
      MEMORY: 255.9 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 112450
     COMMAND: "vtpstatd"
        TASK: ffff88816ae80000  [THREAD_INFO: ffff88816ae80000]
         CPU: 41
       STATE: TASK_RUNNING (PANIC)

crash_805_gdb76> bt
PID: 112450   TASK: ffff88816ae80000  CPU: 41   COMMAND: "vtpstatd"
 #0 [ffff889e3fa87af8] machine_kexec at ffffffff92d059ab
 #1 [ffff889e3fa87c18] __crash_kexec at ffffffff92fb9a99
 #2 [ffff889e3fa87d30] panic at ffffffff9483ed43
 #3 [ffff889e3fa87df8] watchdog_timer_fn at ffffffff93052cf6
 #4 [ffff889e3fa87e30] __hrtimer_run_queues at ffffffff92f5e96e
 #5 [ffff889e3fa87f28] hrtimer_interrupt at ffffffff92f5ffe7
 #6 [ffff889e3fa87fc8] smp_apic_timer_interrupt at ffffffff94a03176
 #7 [ffff889e3fa87ff0] apic_timer_interrupt at ffffffff94a0192f
--- <IRQ stack> ---
 #8 [ffff888263157938] apic_timer_interrupt at ffffffff94a0192f
    [exception RIP: copy_page_range+3681]
    RIP: ffffffff9331d461  RSP: ffff8882631579e8  RFLAGS: 00000246
    RAX: 1ffffd4018a95ad1  RBX: 8000003152b5a805  RCX: ffffea00c54ad688
    RDX: ffffea00c54aee88  RSI: 00007f80d117f000  RDI: ffffffff956468e0
    RBP: ffff8881c5bc2bf8   R8: fffff94018a2e22f   R9: fffff94018a2e22f
    R10: 0000000000000001  R11: fffff94018a2e22e  R12: 0000000000000018
    R13: dffffc0000000000  R14: ffffea00c54ad680  R15: 00007f80d117f000
    ORIG_RAX: ffffffffffffff13  CS: 0010  SS: 0018
 #9 [ffff888263157bc0] copy_process at ffffffff92dadcbd
#10 [ffff888263157e38] _do_fork at ffffffff92dae7a1
#11 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4
#12 [ffff888263157f50] entry_SYSCALL_64_after_hwframe at ffffffff94a000a4
    RIP: 00007f80ec93641a  RSP: 00007ffcb38bbd50  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffcb38bbd50  RCX: 00007f80ec93641a
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000001200011
    RBP: 00007ffcb38bbde0   R8: 000000000001b742   R9: 00007f80ee1a0f80
    R10: 00007f80ee1a1250  R11: 0000000000000246  R12: 000000000001b742
    R13: 00007ffcb38bbd70  R14: 0000000000000000  R15: 00007ffcb38bbf00
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b
-------------------
It seems that gdb7.6 parsing is more convincing. This version is compiled by reverting the commit of update gdb 
(github url: https://github.com/crash-utility/crash/commit/9fab193).
I also tried the release versions of crash 7.3.2 and 8.0.1 (I had problems compiling 8.0.0), 
and the results are consistent with the above. 7.3.2 parsing is normal, and 8.0.1 has the problem.


In crash_805_gdb76 x86_64_framesize_cache[3].framesize=624 :
(gdb) p x86_64_framesize_cache[0]
$136 = {textaddr = 18446744071880546969, framesize = 272, exception = 0}
(gdb) p x86_64_framesize_cache[1]
$137 = {textaddr = 18446744071906258243, framesize = 192, exception = 0}
(gdb) p x86_64_framesize_cache[2]
$138 = {textaddr = 18446744071908104495, framesize = 8, exception = 0}
(gdb) p x86_64_framesize_cache[3]
$139 = {textaddr = 18446744071878401213, framesize = 624, exception = 0}

but In crash_805_gdb102 x86_64_framesize_cache[3].framesize=0 :
(gdb) p x86_64_framesize_cache[0]
$86 = {textaddr = 18446744071880546969, framesize = 272, exception = 0}
(gdb) p x86_64_framesize_cache[1]
$87 = {textaddr = 18446744071906258243, framesize = 192, exception = 0}
(gdb) p x86_64_framesize_cache[2]
$88 = {textaddr = 18446744071908104495, framesize = 8, exception = 0}
(gdb) p x86_64_framesize_cache[3]
$89 = {textaddr = 18446744071878401213, framesize = 0, exception = 0}

---------------------------------------------
After [Walk the process stack. ] of x86_64_low_budget_back_trace_cmd, the value of *up is as follows:
x86_64.c:4059    switch (x86_64_print_stack_entry(bt, ofp, level, i,*up))

The address returned by crash_805_gdb76:
0xffffffff92d059ab
0xffffffff92fb9a99
0xffffffff9483ed43
0xffffffff93052cf6
0xffffffff92f5e96e
0xffffffff92f5ffe7
0xffffffff94a03176
0xffffffff94a0192f
0xffffffff92dadcbd <-copy_page_range
0xffffffff92dae7a1
0xffffffff92c085f4
0xffffffff94a000a4

The address returned by crash_805_gdb102:
0xffffffff92d059ab
0xffffffff92fb9a99
0xffffffff9483ed43
0xffffffff93052cf6
0xffffffff92f5e96e
0xffffffff92f5ffe7
0xffffffff94a03176
0xffffffff94a0192f
0xffffffff92dadcbd <-copy_page_range
0xffffffff92ed8dd5 -------Parts that shouldn't appear
0xffffffff93458397
0xffffffff934585d2
0xffffffff934b5ead --------Parts that shouldn't appear
0xffffffff92dae7a1
0xffffffff92c085f4
0xffffffff94a000a4

Analyze its symbols:
0xffffffff92ed8dd5 __mutex_init+181
0xffffffff93458397 __alloc_file+407
0xffffffff934585d2 alloc_empty_file+146
0xffffffff934b5ead __alloc_fd+141

Generate vmcore parameters:
makedumpfile -l -d 31 /proc/vmcore  [date].kernel_core

Unfortunately, I am not using a regular distribution, it is a deeply customized one

vmcore google drive url:
https://drive.google.com/file/d/1pDICRP6zQafe00c4LWRV-SklkM75971P/view
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux