Hi Aditya, On Wed, Dec 13, 2023 at 10:28 PM Aditya Gupta <adityag@xxxxxxxxxxxxx> wrote: > > Hi Tao, > > On Wed, Dec 13, 2023 at 09:03:37PM +0800, Tao Liu wrote: > > Hi Aditya, > > > > I encountered a problem for analyze the ppc64 vmcore after applied all > > patches in the patchset: > > > > crash> gdb bt > > #0 0xc000000000279d98 in crash_setup_regs (gdb: invalid kernel > > virtual address: fffffffffffffffb type: "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff7 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff3 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffffb type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff7 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff3 type: > > "gdb_readmem callback" > > oldregs=<optimized out>, newregs=0xc000000012e87968) at > > ./arch/powerpc/include/asm/kexec.h:69 > > #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:975 > > #2 0xfffffffffffffffb in ?? () > > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > crash> gdb info threads > > Id Target Id Frame > > 1 CPU 0 plpar_hcall_norets_notrace () at > > arch/powerpc/platforms/pseries/hvCall.S:112 > > * 2 CPU 1 0xc000000000279d98 in crash_setup_regs (gdb: > > invalid kernel virtual address: fffffffffffffffb type: "gdb_readmem > > callback" > > gdb: invalid kernel virtual address: fffffffffffffff7 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff3 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffffb type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff7 type: > > "gdb_readmem callback" > > gdb: invalid kernel virtual address: fffffffffffffff3 type: > > "gdb_readmem callback" > > oldregs=<optimized out>, newregs=0xc000000012e87968) at > > ./arch/powerpc/include/asm/kexec.h:69 > > > > Seems the crash stack unwinding gave a wrong value to gdb. I tried for > > some time to find out the root cause but got unlucky. Hope you can > > help me out. I can give you the vmcore to analyze this issue in > > another mail. Thanks in advance! > > These kind of errors I mostly see due to symbol/structure change in kernel, > maybe something changed in kernel, or some invalid value was read from some > structure. > > Thanks for the backtrace, will try this with upstream kernel. Thanks for your help! > > Just to check I should cause the crash using 'echo c > /proc/sysrq-trigger' > right ? or was it done through some other way ? > Yes, I get the vmcore just by triggering kernel crash by "echo c > /proc/sysrq-trigger'" as you mentioned. In addition, the kernel which I used for debugging is kernel-5.14.0-362.15.1.el9_3. I didn't try the upstream kernel... > > > > Currently I have made the x86_64 stack unwinding work based on your > > patchset. And I plan to post it upstream once your patchsets get > > merged. In addition, is there a plan to support the stack unwinding > > for live debugging in ppc64 arch? I think it is a useful feature > > too... > > Wow, great. I will fix this issue in the patch series, and any issue, then I > guess our patches will be ready to merge :) Yeah, looks great, thanks! Thanks, Tao Liu > > Thanks, > Aditya Gupta > > > > > Thanks, > > Tao Liu > > > > > > > > > > > > On Tue, Dec 12, 2023 at 12:51 PM Aditya Gupta <adityag@xxxxxxxxxxxxx> wrote: > > > > > > On Mon, Dec 11, 2023 at 08:04:50PM +0800, Lianbo Jiang wrote: > > > > On 12/9/23 20:45, Aditya Gupta wrote: > > > > > > > > > Hi, just a ping. Any comments on the series ? > > > > > > > > Hi, Aditya > > > > > > > > > > > > Thank you for the update. I will have a look and do the tests this week. And > > > > give some feedback. > > > > > > Sure. Thanks Lianbo. > > > > > > - Aditya Gupta > > > > > > > > > > > Thanks. > > > > > > > > Lianbo > > > > > > > > > > > > > > On Mon, Dec 04, 2023 at 08:29:36PM +0530, Aditya Gupta wrote: > > > > > > The Problem: > > > > > > ============ > > > > > > > > > > > > Currently crash is unable to show function arguments and local variables, as > > > > > > gdb can do. And functionality for moving between frames ('up'/'down') is not > > > > > > working in crash. > > > > > > > > > > > > Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs > > > > > > 'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to > > > > > > gdb not getting the register values from `crash_target::fetch_registers`, > > > > > > which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64 > > > > > > > > > > > > Proposed Solution: > > > > > > ================== > > > > > > > > > > > > Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64. > > > > > > This way, "gdb mode in crash" will support this feature for both ELF and > > > > > > kdump-compressed vmcore formats, while "gdb" would only have supported ELF > > > > > > format > > > > > > > > > > > > This way other features of 'gdb', such as seeing > > > > > > backtraces/registers/variables/arguments/local variables, moving up and > > > > > > down stack frames, can be used with any ppc64 vmcore, irrespective of > > > > > > being ELF format or kdump-compressed format. > > > > > > > > > > > > Implications on Architectures: > > > > > > ==================================== > > > > > > > > > > > > No architecture other than PPC64 has been affected, other than in case of > > > > > > 'frame' command > > > > > > > > > > > > As mentioned in patch #2, since frame will not be prohibited, so it will print: > > > > > > > > > > > > crash> frame > > > > > > #0 <unavailable> in ?? () > > > > > > > > > > > > Instead of before prohibited message: > > > > > > > > > > > > crash> frame > > > > > > crash: prohibited gdb command: frame > > > > > > > > > > > > Major change will be in 'gdb mode' on PPC64, that it will print the frames, and > > > > > > local variables, instead of failing with errors showing no frame, or showing > > > > > > that couldn't get PC, it will be able to give all this information. > > > > > > > > > > > > Testing: > > > > > > ======== > > > > > > > > > > > > Git tree with this patch series applied: > > > > > > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-3 > > > > > > > > > > > > To test various gdb passthroughs: > > > > > > > > > > > > gdb> set > > > > > > gdb> set gdb on > > > > > > gdb> thread > > > > > > gdb> bt > > > > > > gdb> info threads > > > > > > gdb> info threads > > > > > > gdb> info locals > > > > > > gdb> info variables irq_rover_lock > > > > > > gdb> info args > > > > > > gdb> thread 2 > > > > > > gdb> set gdb off > > > > > > gdb> set > > > > > > gdb> set -c 6 > > > > > > gdb> gdb thread > > > > > > gdb> bt > > > > > > gdb> gdb bt > > > > > > gdb> frame > > > > > > gdb> up > > > > > > gdb> down > > > > > > gdb> info locals > > > > > > > > > > > > Known Issues: > > > > > > ============= > > > > > > > > > > > > 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected > > > > > > from older kernels. This is a known issue due to register mismatch, and > > > > > > its fix has been merged upstream: > > > > > > > > > > > > Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79 > > > > > > > > > > > > Fixing GDB passthroughs on other architectures > > > > > > ============================================== > > > > > > > > > > > > Much of the work for making gdb passthroughs like 'gdb bt', 'gdb > > > > > > thread', 'gdb info locals' etc. has been done by the patches introducing > > > > > > 'machdep->get_cpu_reg' and this series fixing some issues in that. > > > > > > > > > > > > Other architectures should be able to fix these gdb functionalities by > > > > > > simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'. > > > > > > > > > > > > The reasoning behind that has been explained with a diagram in commit > > > > > > description of patch #1 > > > > > > > > > > > > I will assist with my findings/observations fixing it on ppc64 whenever needed. > > > > > > > > > > > > Additional Notes: > > > > > > ================= > > > > > > > > > > > > Sorry, it took a long time to send this version. Tried fixing 'info > > > > > > threads' but wasn't able to. Gave it time again, and was able to fix it > > > > > > this time after multiple days of debugging. > > > > > > > > > > > > Some other things from last version review: > > > > > > > > > > > > * 'info rv' not working: > > > > > > It's not supported in gdb, instead we need to use 'info locals rv' or > > > > > > 'info variables rv' > > > > > > > > > > > > * 'info variables' command hangs... and prints nothing after hanging for long > > > > > > It likely hangs due to a lot of symbols being there, and it's trying to > > > > > > get all gdb's output and page it, so Control+C messes it up, but if we pass > > > > > > a regex filter to limit the output, eg. info variables rq, then it doesn't > > > > > > hang, and prints the variables/symbols. > > > > > > Even with gdb, ie. simply running 'gdb vmlinux vmcore' also hangs due > > > > > > to the lot of symbols > > > > > > > > > > > > * making crashing thread as default in gdb: > > > > > > This is implemented now, along with synchronising crash & gdb contexts, in > > > > > > patch #3 > > > > > > > > > > > > * 'info threads' not working: > > > > > > This turned to be due to a bug in gdb_interface. I fixed 'info > > > > > > threads' in 2 patches, to simplify it, first for the gdb_interface, > > > > > > and another patch for setting the context correctly in crash > > > > > > > > > > > > * other info commands: > > > > > > I tested all the info commands, in crash along with this patch. > > > > > > Most of those that fail in crash are due to gdb itself not supporting > > > > > > them with vmcores, and other than that is the 'info pretty' command, > > > > > > which might not be needed in crash anyways > > > > > > > > > > > > * live debugging showing only one thread: > > > > > > I tried it with crash, crash shows only the current thread, ie. > > > > > > itself, so it does not have information of registers for the other > > > > > > CPUs. Similarly gdb does not support live kernel debugging (without > > > > > > connecting to a gdbstub/QEMU etc.). > > > > > > If you need I can make it show the current thread id correctly for > > > > > > the one thread, but I don't think it might help much with live > > > > > > debugging > > > > > > > > > > > > Hope, I set the context, thanks for the reviews, I replied and worked > > > > > > on your suggestions, but got stuck there due to 'info threads' > > > > > > > > > > > > Changelog: > > > > > > ========== > > > > > > > > > > > > V3: > > > > > > + default gdb thread will be the crashing thread, instead of being > > > > > > thread '0' > > > > > > + synchronise crash cpu and gdb thread context > > > > > > + fix bug in gdb_interface, that replaced gdb's output stream, losing > > > > > > output in some cases, such as info threads and extra output in info > > > > > > variables > > > > > > + fix 'info threads' > > > > > > > > > > > > RFC V2: > > > > > > - removed patch implementing 'frame', 'up', 'down' in crash > > > > > > - updated the cover letter by removing the mention of those commands other > > > > > > than the respective gdb passthrough > > > > > > > > > > > > Aditya Gupta (5): > > > > > > ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg > > > > > > remove 'frame' from prohibited commands list > > > > > > synchronise cpu context changes between crash/gdb > > > > > > fix gdb_interface: restore gdb's output streams at end of > > > > > > gdb_interface > > > > > > fix 'info threads' command > > > > > > > > > > > > crash_target.c | 44 ++++++++++++++++ > > > > > > defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++- > > > > > > gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++- > > > > > > gdb_interface.c | 2 +- > > > > > > kernel.c | 47 +++++++++++++++-- > > > > > > ppc64.c | 95 +++++++++++++++++++++++++++++++++-- > > > > > > task.c | 14 ++++++ > > > > > > tools.c | 2 +- > > > > > > 8 files changed, 434 insertions(+), 10 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.41.0 > > > > > > > > > > > > > -- > > > Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx > > > To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx > > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > > > Contribution Guidelines: https://github.com/crash-utility/crash/wiki > > > -- Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki