[Crash-utility] Re: [PATCH v5 0/5] Improve stack unwind on ppc64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/4/24 23:27, Aditya Gupta wrote:

Hi Lianbo,

Found the reason for the warning, basically it was due to gdb trying various
functions/unwinders, which are okay to fail, one of which tried accessing
instructions of a function marked '__init' in the kernel.

More details follow.

Thank you for investigating the details, Aditya.

It helps me a lot.

crash> gdb bt
#0  <unavailable> in ?? ()
#1  0xc0000000000f570c in plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:111
#2  0xc000000001004dd8 in cede_processor () at
./arch/powerpc/include/asm/plpar_wrappers.h:37
#3  check_and_cede_processor () at drivers/cpuidle/cpuidle-pseries.c:83
#4  0xc000000001005000 in shared_cede_loop (dev=<optimized out>,
drv=<optimized out>, index=<optimized out>) at
drivers/cpuidle/cpuidle-pseries.c:256
#5  0xc000000001004498 in cpuidle_enter_state
(dev=dev@entry=0xc0000001ff5910c0, drv=drv@entry=0xc000000002b8f558
<pseries_idle_driver>, index=index@entry=1) at drivers/cpuidle/cpuidle.c:267
#6  0xc000000000c0eb4c in cpuidle_enter (drv=0xc000000002b8f558
<pseries_idle_driver>, dev=0xc0000001ff5910c0, index=<optimized out>) at
drivers/cpuidle/cpuidle.c:388
#7  0xc0000000001ce2bc in call_cpuidle (drv=<optimized out>,
drv@entry=0xc000000002b8f558 <pseries_idle_driver>, dev=<optimized out>,
dev@entry=0xc0000001ff5910c0, next_state=<optimized out>) at
kernel/sched/idle.c:134
#8  0xc0000000001d5d68 in cpuidle_idle_call () at kernel/sched/idle.c:215
#9  0xc0000000001d5f58 in do_idle () at kernel/sched/idle.c:282
#10 0xc0000000001d6298 in cpu_startup_entry (state=<optimized out>) at
kernel/sched/idle.c:380
#11 0xc000000000011030 in rest_init () at init/main.c:730
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem callback"
#12 0xc000000002004c80 in arch_call_rest_init () at init/main.c:827
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem callback"
#13 0xc0000000020051ec in start_kernel () at init/main.c:1072
#14 0xc00000000000e788 in start_here_common () at
arch/powerpc/kernel/head_64.S:1039
These warnings were caused due to gdb accessing some addresses which were freed
by the kernel, and thus are expected to not be available ('page excluded').

---------
Q. Which addresses ?

I noticed that the addresses in the warnings are instructions of the given
function. For example, before frame #13 (`start_kernel`), we see it's PC was
`0xc0000000020051ec`, and it failed accesing `...20051e4` and `...20051e8`,
these are instructions just before the call was made.
Verified this from kernel disassembly also.

Q. Why are the addresses unavailable ?

The instructions are of `start_kernel` which is defined as `__init` in the linux
kernel. Such functions are intended to be called only during initialisation, and
the instructions/code of these functions are put in '.init.text' section in the
ELF file, which "is freed later by free_initmem", to free some memory used by
these functions which will not be called later.

Q. Why did GDB even try to access instructions while unwinding ?

While unwinding the frames, GDB is trying many functions to see if any one can
give it the previous frame's registers (aka unwinders).

One of the the functions is `tramp_frame_sniffer`, which relies on checking the
instructions at and around the PC, to see if it's a trampoline frame, and
proceed further.

This is the sequence of unwinders GDB is trying for every frame:

     dummy_frame_unwind
     dwarf2_tailcall_frame_unwind
     inline_frame_unwind
     0x11567ff0 -> tramp_frame_sniffer
     0x11567f90 -> tramp_frame_sniffer
     dwarf2_frame_unwind   ==================> CORRECT ONE
     dwarf2_signal_frame_unwind
     rs6000_epilogue_frame_unwind
     rs6000_frame_unwind

GDB tries all these unwinders in sequence, till one of them is able to get the
registers correctly. Here, for most frames, `dwarf2_frame_unwind` is the correct
unwinder which uses the DWARF debuginfo to unwind.

Q. Is it harmless ?

Yes, all unwinders which were tried before chosing the correct one, are expected
to fail.

About address being unavailable, that also is harmless because:
1. The functions instructions were freed by the kernel, so the data at those
address being missing is expected
2. Even GDB is okay with a read failing, in this case, tramp_frame_sniffer
simply returns if it's unable to get an address.

------------

Btw, this only happens with CPU0 WHEN it has idle task. Since unlike other CPUs
idle tasks, CPU 0's idle task is called during this initialisation sequence,
thus it's backtrace having very early init functions.
It's true.
Since this is expected and harmless (gdb handles such cases), I am thinking to
ignore this warning when called by gdb, what do you suggest ?

If this does happen and it's rare, it might be good to have a warning for users.

At least it reflects the actual situation, although the warning output looks not very friendly. So I would suggest still leaving it there for the time being.


     i) Add flag "IGNORE_PAGE_EXCLUDED" to 'readmem' call when done by gdb, and
        only print the error if this is true:
if ((error_handle & IGNORE_PAGE_EXCLUDED) && (CRASHDEBUG(0))
     		...
This way, it will still be visible with -d 1 and above debug levels. ii) Append 'quiet' to the 'error_handle': This way we don't have to introduce a flag, but it might skip showing other
     warnings/errors which might be of interest

Other than this, there was a bug introduced in V5, where default thread is still
thread 1 (CPU 0) at initialisation, which has been fixed, and will also fix the
Sounds good.
'unavailable' frame being shown. Though it's funny that bug helped in a way to
uncover these warnings, this would have hit in earlier versions also.

Indeed, it can be triggered conditionally.

I have no other issues, thank you for the great job, Aditya.


Thanks

Lianbo


Thanks,
Aditya Gupta

crash> gdb frame
#0  <unavailable> in ?? ()
crash> set gdb on
gdb: on
gdb> bt
#0  <unavailable> in ?? ()
#1  0xc0000000000f570c in plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:111
#2  0xc000000001004dd8 in cede_processor () at
./arch/powerpc/include/asm/plpar_wrappers.h:37
#3  check_and_cede_processor () at drivers/cpuidle/cpuidle-pseries.c:83
#4  0xc000000001005000 in shared_cede_loop (dev=<optimized out>,
drv=<optimized out>, index=<optimized out>) at
drivers/cpuidle/cpuidle-pseries.c:256
#5  0xc000000001004498 in cpuidle_enter_state
(dev=dev@entry=0xc0000001ff5910c0, drv=drv@entry=0xc000000002b8f558
<pseries_idle_driver>, index=index@entry=1) at drivers/cpuidle/cpuidle.c:267
#6  0xc000000000c0eb4c in cpuidle_enter (drv=0xc000000002b8f558
<pseries_idle_driver>, dev=0xc0000001ff5910c0, index=<optimized out>) at
drivers/cpuidle/cpuidle.c:388
#7  0xc0000000001ce2bc in call_cpuidle (drv=<optimized out>,
drv@entry=0xc000000002b8f558 <pseries_idle_driver>, dev=<optimized out>,
dev@entry=0xc0000001ff5910c0, next_state=<optimized out>) at
kernel/sched/idle.c:134
#8  0xc0000000001d5d68 in cpuidle_idle_call () at kernel/sched/idle.c:215
#9  0xc0000000001d5f58 in do_idle () at kernel/sched/idle.c:282
#10 0xc0000000001d6298 in cpu_startup_entry (state=<optimized out>) at
kernel/sched/idle.c:380
#11 0xc000000000011030 in rest_init () at init/main.c:730
#12 0xc000000002004c80 in arch_call_rest_init () at init/main.c:827
#13 0xc0000000020051ec in start_kernel () at init/main.c:1072
#14 0xc00000000000e788 in start_here_common () at
arch/powerpc/kernel/head_64.S:1039
gdb> info threads
   Id   Target Id         Frame
* 1    CPU 0             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   2    CPU 1             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   3    CPU 2             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   4    CPU 3             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   5    CPU 4             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   6    CPU 5             0xc00000000028b5e8 in crash_setup_regs
(oldregs=<optimized out>, newregs=0xc00000005d3e7958) at
./arch/powerpc/include/asm/kexec.h:69
   7    CPU 6             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
   8    CPU 7             plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem_callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem_callback"

gdb> thread 1
[Switching to thread 1 (CPU 0)]
#0  plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
114             li      r4,0
gdb> bt
#0  plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
#1  0xc000000001004dd8 in cede_processor () at
./arch/powerpc/include/asm/plpar_wrappers.h:37
#2  check_and_cede_processor () at drivers/cpuidle/cpuidle-pseries.c:83
#3  0xc000000001005000 in shared_cede_loop (dev=<optimized out>,
drv=<optimized out>, index=<optimized out>) at
drivers/cpuidle/cpuidle-pseries.c:256
#4  0xc000000001004498 in cpuidle_enter_state
(dev=dev@entry=0xc0000001ff5910c0, drv=drv@entry=0xc000000002b8f558
<pseries_idle_driver>, index=index@entry=1) at drivers/cpuidle/cpuidle.c:267
#5  0xc000000000c0eb4c in cpuidle_enter (drv=0xc000000002b8f558
<pseries_idle_driver>, dev=0xc0000001ff5910c0, index=<optimized out>) at
drivers/cpuidle/cpuidle.c:388
#6  0xc0000000001ce2bc in call_cpuidle (drv=<optimized out>,
drv@entry=0xc000000002b8f558 <pseries_idle_driver>, dev=<optimized out>,
dev@entry=0xc0000001ff5910c0, next_state=<optimized out>) at
kernel/sched/idle.c:134
#7  0xc0000000001d5d68 in cpuidle_idle_call () at kernel/sched/idle.c:215
#8  0xc0000000001d5f58 in do_idle () at kernel/sched/idle.c:282
#9  0xc0000000001d6298 in cpu_startup_entry (state=<optimized out>) at
kernel/sched/idle.c:380
#10 0xc000000000011030 in rest_init () at init/main.c:730
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c80 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c7c type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c000000002004c78 type:
"gdb_readmem callback"
#11 0xc000000002004c80 in arch_call_rest_init () at init/main.c:827
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051ec type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e8 type:
"gdb_readmem callback"
gdb: page excluded: kernel virtual address: c0000000020051e4 type:
"gdb_readmem callback"
#12 0xc0000000020051ec in start_kernel () at init/main.c:1072
#13 0xc00000000000e788 in start_here_common () at
arch/powerpc/kernel/head_64.S:1039
gdb> thread 3
[Switching to thread 3 (CPU 2)]
#0  plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
114             li      r4,0
gdb> bt
#0  plpar_hcall_norets_notrace () at
arch/powerpc/platforms/pseries/hvCall.S:114
#1  0xc000000001004dd8 in cede_processor () at
./arch/powerpc/include/asm/plpar_wrappers.h:37
#2  check_and_cede_processor () at drivers/cpuidle/cpuidle-pseries.c:83
#3  0xc000000001005000 in shared_cede_loop (dev=<optimized out>,
drv=<optimized out>, index=<optimized out>) at
drivers/cpuidle/cpuidle-pseries.c:256
#4  0xc000000001004498 in cpuidle_enter_state
(dev=dev@entry=0xc0000001ff7910c0, drv=drv@entry=0xc000000002b8f558
<pseries_idle_driver>, index=index@entry=1) at drivers/cpuidle/cpuidle.c:267
#5  0xc000000000c0eb4c in cpuidle_enter (drv=0xc000000002b8f558
<pseries_idle_driver>, dev=0xc0000001ff7910c0, index=<optimized out>) at
drivers/cpuidle/cpuidle.c:388
#6  0xc0000000001ce2bc in call_cpuidle (drv=<optimized out>,
drv@entry=0xc000000002b8f558 <pseries_idle_driver>, dev=<optimized out>,
dev@entry=0xc0000001ff7910c0, next_state=<optimized out>) at
kernel/sched/idle.c:134
#7  0xc0000000001d5d68 in cpuidle_idle_call () at kernel/sched/idle.c:215
#8  0xc0000000001d5f58 in do_idle () at kernel/sched/idle.c:282
#9  0xc0000000001d6298 in cpu_startup_entry (state=<optimized out>) at
kernel/sched/idle.c:380
#10 0xc00000000005f048 in start_secondary (unused=<optimized out>) at
arch/powerpc/kernel/smp.c:1680
#11 0xc00000000000e058 in start_secondary_prolog () at
arch/powerpc/kernel/head_64.S:885
gdb>


Could you please check it again? Or am I missing anything? I did the test
based on upstream kernel 6.7.0-rc7(commit:8735c7c84d1b).

BTW: I did not see the similar issues after applying the v4 patch set.


Thanks

Lianbo

Known Issues:
=============

1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
     from older kernels. This is a known issue due to register mismatch, and
     its fix has been merged upstream:

     This can also cause some 'invalid kernel virtual address' errors during gdb
     unwinding the stack registers

Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79

Fixing GDB passthroughs on other architectures
==============================================

Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.

Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.

The reasoning behind that has been explained with a diagram in commit
description of patch #1

I will assist with my findings/observations fixing it on ppc64 whenever needed.

Changelog:
==========

V5:
+ changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
code
+ changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
remove unneeded if condition
+ changes in patch #5: implement refresh regcache on per thread, instead of all
threads at once

V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)

V3:
+ default gdb thread will be the crashing thread, instead of being
    thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
    output in some cases, such as info threads and extra output in info
    variables
+ fix 'info threads'

RFC V2:
    - removed patch implementing 'frame', 'up', 'down' in crash
    - updated the cover letter by removing the mention of those commands other
	than the respective gdb passthrough

Aditya Gupta (5):
    ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
    remove 'frame' from prohibited commands list
    synchronise cpu context changes between crash/gdb
    fix gdb_interface: restore gdb's output streams at end of
      gdb_interface
    fix 'info threads' command

   crash_target.c  |  44 ++++++++++++++++
   defs.h          | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
   gdb-10.2.patch  | 110 +++++++++++++++++++++++++++++++++++++++-
   gdb_interface.c |   2 +-
   kernel.c        |  47 +++++++++++++++--
   ppc64.c         |  95 +++++++++++++++++++++++++++++++++--
   task.c          |  14 ++++++
   tools.c         |   2 +-
   8 files changed, 434 insertions(+), 10 deletions(-)

--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux