[Crash-utility] Re: [PATCH] gdb: fix the "p" command incorrectly print the value of a global variable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 6, 2024 at 6:30 PM Daisuke Hatayama (Fujitsu) <d.hatayama@xxxxxxxxxxx> wrote:
Lianbo,

Thank you for your work.

> Some objects format may potentially support copy relocations, but
> currently the maybe_copied is always initialized to 0 in the symbol().
> And the type is 'mst_file_bss', not always the 'mst_bss' or 'mst_data'
> in the lookup_minimal_symbol_linkage(). For example:
>
> (gdb) p *msymbol
> $42 = {<general_symbol_info> = {m_name = 0x349812f "test_no_static", value = {ivalue = 8, block = 0x8,
>       bytes = 0x8 <error: Cannot access memory at address 0x8>, address = 8, common_block = 0x8, chain = 0x8}, language_specific = {
>       obstack = 0x0, demangled_name = 0x0}, m_language = language_auto, ada_mangled = 0, section = 20}, size = 4,
>   filename = 0x6db3440 "test_sanity.c", type = mst_file_bss, created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size = 1,
>   maybe_copied = 0, name_set = 1, hash_next = 0x0, demangled_hash_next = 0x0}

The current description lacks explanation of when this issue
occurs. Please write that the issue occurs when the corresponding
kernel is built with CONFIG_CALL_DEPTH_TRACKING=y.


Thank you for the comment, Hatayama.

I should describe more background on this issue in the patch log. The current issue can be easily reproduced with the following kernel commit:

commit 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 (HEAD)
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date:   Thu Sep 15 13:11:19 2022 +0200

    x86/retbleed: Add X86_FEATURE_CALL_DEPTH
   
    Intel SKL CPUs fall back to other predictors when the RSB underflows. The
    only microcode mitigation is IBRS which is insanely expensive. It comes
    with performance drops of up to 30% depending on the workload.
   
    A way less expensive, but nevertheless horrible mitigation is to track the
    call depth in software and overeagerly fill the RSB when returns underflow
    the software counter.
   
    Provide a configuration symbol and a CPU misfeature bit.
   
    Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20220915111147.056176424@xxxxxxxxxxxxx

After reverting the above commit, the current issue may disappear. And originally I tried to find the clue how this kernel commit changes affected the gdb, I have not found the clue for the time being. But later I noticed that the gdb gets the correct offset address of a global variable 'test_no_static', which is an expected behavior from the gdb perspective because of copy relocations, probably some object files potentially support the copy relocations, just like this. 

It would also be good to describe the fact that the issue occurs at
least on RHEL9 kernel.

This is an upstream issue, I have reproduced it on the upstream kernel with the above kernel commit changes.
 

> This causes a problem that the 'p' command can not work well as
> expected, and always gets an error:
>
>   crash> mod -s test_sanity /home/test_sanity.ko
>        MODULE       NAME                         BASE           SIZE  OBJECT FILE
>   ffffffffc1084040  test_sanity            ffffffffc1082000    16384  /home/test_sanity.ko
>   crash> p test_no_static
>   p: gdb request failed: p test_no_static
>   crash>
>
> With the patch:
>   crash> mod -s test_sanity /home/test_sanity.ko
>        MODULE       NAME                         BASE           SIZE  OBJECT FILE
>   ffffffffc1084040  test_sanity            ffffffffc1082000    16384  /home/test_sanity.ko
>   crash> p test_no_static
>   test_no_static = $1 = 5
>   crash>

It's correct that p command doesn't work as expected, but it doesn't
always result in some error. This issue is failure of calculating
relocated address of static symbols. If the calculated address happens
to be the address where read can be successfull, it doesn't result in
read error but outputs some bogus value.

It's true, but the bogus value is not an expected result because of an incorrect address.
That is why the maybe_copied flag is initialized to 1, as I mentioned above, some objfile may potentially support the copy relocations.


Thanks.
Lianbo
 

To make this clear, I think it's better to set debug level 4 and to
have p command output calculated virtual address as debug messages.

For example:

    crash> sym -M | grep -E " test_no"
    ffffffffc0da7580 (B) test_no
    ffffffffc0da7584 (b) test_no_static
    crash> set debug 4
    debug: 4
    crash> p test_no
    p: per_cpu_symbol_search(test_no): NULL
    test_no = <readmem: ffffffffc0da7580, KVADDR, "gdb_readmem callback", 4, (ROE), 560d2d483400>
    <read_diskdump: addr: ffffffffc0da7580 paddr: 10b263580 cnt: 4>
    $3 = 5
    crash> p test_no_static
    p: per_cpu_symbol_search(test_no_static): NULL
    test_no_static = <readmem: ffffffffc0d9f004, KVADDR, "gdb_readmem callback", 4, (ROE), 560d2dc9b100>
    <read_diskdump: addr: ffffffffc0d9f004 paddr: 108bfc004 cnt: 4>
    $4 = -1869574000


--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux