On Fri, Feb 02, 2024 at 07:04:14AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Aditya, > > Thank you for the work, it looks nicely done to me, except for the "info > threads" issue and the style of the gdb patch. > > Hi Tao Liu, > > I saw that you were interested in support for other architectures. I'd > like to test/support this also on x86_64 at the same time as ppc64 if > possible. Do you have any trial patch or plan? Hi Kazu & Aditya, Sorry for the delay. You can access my trial patch by https://github.com/liutgnu/crash-dev. There are some known issue: 1) Some vmcores will stack unwind fail: $ ./crash /var/crash/127.0.0.1-2023-11-10-18\:27\:30/vmcore ~/vmlinux KERNEL: /root/vmlinux [TAINTED] DUMPFILE: /var/crash/127.0.0.1-2023-11-10-18:27:30/vmcore [PARTIAL DUMP] CPUS: 1 DATE: Fri Nov 10 18:27:26 CST 2023 UPTIME: 00:10:49 LOAD AVERAGE: 0.07, 0.11, 0.08 TASKS: 133 NODENAME: ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com RELEASE: 5.14.0-39.el9.x86_64 VERSION: #1 SMP PREEMPT Fri Dec 24 00:07:58 EST 2021 MACHINE: x86_64 (2303 Mhz) MEMORY: 4 GB PANIC: "Oops: 0002 [#1] PREEMPT SMP NOPTI" (check log for details) PID: 22722 COMMAND: "insmod" TASK: ffff973e88408000 [THREAD_INFO: ffff973e88408000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> set 1 >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3 >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3 crash> gdb bt #0 0xffffffffb726a1b3 in context_switch (rf=0xffffa5ee40013d00, next=0xffff973e88408000, prev=0xffff973e801fb280, rq=0xffff973efbc2c800) at kernel/sched/core.c:4972 #1 __schedule (sched_mode=0) at kernel/sched/core.c:6253 #2 0x0004005300000004 in ?? () crash> bt PID: 1 TASK: ffff973e801fb280 CPU: 0 COMMAND: "systemd" #0 [ffffa5ee40013d30] __schedule at ffffffffb726a1b3 #1 [ffffa5ee40013d78] schedule at ffffffffb726a553 #2 [ffffa5ee40013d90] schedule_hrtimeout_range_clock at ffffffffb726f294 #3 [ffffa5ee40013e10] ep_poll at ffffffffb6bc00c4 #4 [ffffa5ee40013eb0] do_epoll_wait at ffffffffb6bc01db #5 [ffffa5ee40013ee8] __x64_sys_epoll_wait at ffffffffb6bc09b0 #6 [ffffa5ee40013f38] do_syscall_64 at ffffffffb725ea98 #7 [ffffa5ee40013f50] entry_SYSCALL_64_after_hwframe at ffffffffb740007c The stack unwinding failed for "gdb bt", it only unwinded the "__schedule" function. The similarities for the failing is rsp and rbp which got from stack pointing to different stack frames: >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3 crash> rd ffffa5ee40013d00 32 ffffa5ee40013d00: ffff973ef8eef4a8 0000000000000000 ....>........... r15, r14 (struct inactive_task_frame) ffffa5ee40013d10: ffff973e88408000 ffff973e801fb280 ..@.>.......>... r13, r12 ffffa5ee40013d20: ffff973e801fbd98 ffff973efbc2c800 ....>.......>... bx, bp ffffa5ee40013d30: ffffffffb726a1b3 ffff973e00000001 ..&.........>... ret_addr ffffa5ee40013d40: 0004005300000004 f915b72f0f356b00 ....S....k5./... ffffa5ee40013d50: ffff973e801fb280 ffff973e801fb280 ....>.......>... ffffa5ee40013d60: ffff973e801fb280 ffff973ef8eef4e0 ....>.......>... ffffa5ee40013d70: 0000000000000000 ffffffffb726a553 ........S.&..... ffffa5ee40013d80: ffff973ef8eef480 ffffa5ee40013e68 ....>...h>.@.... ffffa5ee40013d90: ffffffffb726f294 ffffffffb6bbf092 ..&............. ffffa5ee40013da0: 000055d6d4081de0 00000054b6b4c6fd .....U......T... ffffa5ee40013db0: ffff973ef8eef4d0 ffffa5ee40013db8 ....>....=.@.... ffffa5ee40013dc0: ffffa5ee40013db8 0000000000000000 .=.@............ ffffa5ee40013dd0: ffff973e00000019 f915b72f0f356b00 ....>....k5./... ffffa5ee40013de0: f915b72f0f356b00 ffff973ef8eef480 .k5./.......>... ffffa5ee40013df0: ffffa5ee40013e68 ffff973e801fb280 h>.@........>... And bp is modified as follows: crash> dis schedule 0xffffffffb726a510 <schedule>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffb726a515 <schedule+5>: push %rbp 0xffffffffb726a516 <schedule+6>: mov %gs:0x16f40,%rbp <<< 0xffffffffb726a51f <schedule+15>: push %rbx 0xffffffffb726a520 <schedule+16>: mov 0x18(%rbp),%eax 0xffffffffb726a523 <schedule+19>: test %eax,%eax I'm not sure why in this case gdb cannot get the stack unwinded. Other than this case, the x86_64 stack unwinding works fine according to my test. 2) May break the original ppc64 patch. This x86_64 patch is based on the original v7 ppc stack unwinding patch. And it modified a bit of the original ppc64 patch code. I haven't fully tested the code in ppc64 arch. Thanks, Tao Liu > > Thanks, > Kazu > > On 2024/01/05 16:30, Aditya Gupta wrote: > > The Problem: > > ============ > > > > Currently crash is unable to show function arguments and local variables, as > > gdb can do. And functionality for moving between frames ('up'/'down') is not > > working in crash. > > > > Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs > > 'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to > > gdb not getting the register values from `crash_target::fetch_registers`, > > which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64 > > > > Proposed Solution: > > ================== > > > > Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64. > > This way, "gdb mode in crash" will support this feature for both ELF and > > kdump-compressed vmcore formats, while "gdb" would only have supported ELF > > format > > > > This way other features of 'gdb', such as seeing > > backtraces/registers/variables/arguments/local variables, moving up and > > down stack frames, can be used with any ppc64 vmcore, irrespective of > > being ELF format or kdump-compressed format. > > > > Note: This doesn't support live debugging on ppc64, since registers are not > > available to be read > > > > Implications on Architectures: > > ==================================== > > > > No architecture other than PPC64 has been affected, other than in case of > > 'frame' command > > > > As mentioned in patch #2, since frame will not be prohibited, so it will print: > > > > crash> frame > > #0 <unavailable> in ?? () > > > > Instead of before prohibited message: > > > > crash> frame > > crash: prohibited gdb command: frame > > > > Major change will be in 'gdb mode' on PPC64, that it will print the frames, and > > local variables, instead of failing with errors showing no frame, or showing > > that couldn't get PC, it will be able to give all this information. > > > > Testing: > > ======== > > > > Git tree with this patch series applied: > > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6 > > > > To test various gdb passthroughs: > > > > (crash) set > > (crash) set gdb on > > gdb> thread > > gdb> bt > > gdb> info threads > > gdb> info threads > > gdb> info locals > > gdb> info variables irq_rover_lock > > gdb> info args > > gdb> thread 2 > > gdb> set gdb off > > (crash) set > > (crash) set -c 6 > > (crash) gdb thread > > (crash) bt > > (crash) gdb bt > > (crash) frame > > (crash) up > > (crash) down > > (crash) info locals > > > > Known Issues: > > ============= > > > > 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected > > from older kernels. This is a known issue due to register mismatch, and > > its fix has been merged upstream: > > > > This can also cause some 'invalid kernel virtual address' errors during gdb > > unwinding the stack registers > > > > Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79 > > > > Fixing GDB passthroughs on other architectures > > ============================================== > > > > Much of the work for making gdb passthroughs like 'gdb bt', 'gdb > > thread', 'gdb info locals' etc. has been done by the patches introducing > > 'machdep->get_cpu_reg' and this series fixing some issues in that. > > > > Other architectures should be able to fix these gdb functionalities by > > simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'. > > > > The reasoning behind that has been explained with a diagram in commit > > description of patch #1 > > > > I will assist with my findings/observations fixing it on ppc64 whenever needed. > > > > Changelog: > > ========== > > > > V6: > > + changes in patch #5: fix bug introduced in v5 that caused initial gdb thread > > to be thread 1 > > > > V5: > > + changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable > > code > > + changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum', > > remove unneeded if condition > > + changes in patch #5: implement refresh regcache on per thread, instead of all > > threads at once > > > > V4: > > + fix segmentation fault in live debugging (change in patch #1) > > + mention live debugging not supported in cover letter and patch #1 > > + fixed some checkpatch warnings (change in patch #5) > > > > V3: > > + default gdb thread will be the crashing thread, instead of being > > thread '0' > > + synchronise crash cpu and gdb thread context > > + fix bug in gdb_interface, that replaced gdb's output stream, losing > > output in some cases, such as info threads and extra output in info > > variables > > + fix 'info threads' > > > > RFC V2: > > - removed patch implementing 'frame', 'up', 'down' in crash > > - updated the cover letter by removing the mention of those commands other > > than the respective gdb passthrough > > > > Aditya Gupta (5): > > ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg > > remove 'frame' from prohibited commands list > > synchronise cpu context changes between crash/gdb > > fix gdb_interface: restore gdb's output streams at end of > > gdb_interface > > fix 'info threads' command > > > > crash_target.c | 44 ++++++++++++++++ > > defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++- > > gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++- > > gdb_interface.c | 2 +- > > kernel.c | 47 +++++++++++++++-- > > ppc64.c | 95 +++++++++++++++++++++++++++++++++-- > > task.c | 14 ++++++ > > tools.c | 2 +- > > 8 files changed, 434 insertions(+), 10 deletions(-) > > -- Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki