Re: [PATCH 0/4] To support module percpu symbol

Dave Anderson <anderson@xxxxxxxxxx> · Tue, 9 Nov 2010 15:29:03 -0500 (EST)

----- "Toshikazu Nakayama" <nakayama.ts@xxxxxxxxxxxxxx> wrote:

> Hi Dave,
> 
> I did some wrong changes and would understand your mentions.
> 
> Updating __per_cpu_end itself makes original UI's corruption.
> And for sure, I want __per_cpu_end to be bumped with your guess.
> 
> I fixed these bugs like attachments and tested again,
> of course checked st->__per_cpu_end value advised from you.
> 
> One more update is to add is_per_cpu_range().
> Because there are some conditions about percpu range in symbols.c,
> I think it is better to use common function than inline.
> The function returns 0 (not percpu), 1 (kernels) and 2 (modules)
> although callers do not distinguish between 1 and 2 now.

Hello Toshi,

Unfortunately there are still problems with this patch design, 
both with the "legacy" and "current" kernel types. 

I modified your patch to display the new items added to the 
structures in defs.h.  For example, the new "st->percpu_mod_used"
value should be shown in dump_symbol_table(), along with the 
effective (extended) end of percpu space that is used by your
is_per_cpu_range() function:

  crash> help -s
  ...
   __per_cpu_start: ffffffff80419000
     __per_cpu_end: ffffffff8041f508
   percpu_mod_used: 8508 (extended percpu end: ffffffff80427a10)
  ...

Anyway, as I mentioned in my review of your first patch, support
for the "legacy" module percpu symbols is not acceptable.  In 
the example above, note that the end of per-cpu space gets extended
from ffffffff8041f508 to ffffffff80427a10.  But that value pushes 
into legitimate symbol virtual address space above "__per_cpu_end":

  crash> sym -l
  ...
  ffffffff8041f4c0 (d) per_cpu__rt_cache_stat  
  ffffffff8041f500 (d) per_cpu____icmp_socket  
  ffffffff8041f508 (A) __per_cpu_end           <-- original value  
  ffffffff80420000 (d) .data_nosave  
  ffffffff80420000 (A) __nosave_begin  
  ffffffff80420000 (D) in_suspend  
  ffffffff80421000 (b) .bss  
  ffffffff80421000 (A) __bss_start  
  ffffffff80421000 (A) __nosave_end  
  ffffffff80421000 (B) empty_zero_page  
  ffffffff80422000 (B) boot_cpu_stack  
  ffffffff80426000 (B) boot_exception_stacks   <-- bumped to ffffffff80427a10
  ffffffff8042c000 (B) idt_table  
  ffffffff8042d000 (B) boot_delay  
  ffffffff8042d008 (B) printk_delay_msec 
  ...

So, for example, take the "in_suspend" variable above, which is a global 
integer variable at address ffffffff80420000:

  crash> whatis in_suspend
  int in_suspend;
  crash>

Without your patch, it gets printed correctly like this:

  crash> p in_suspend
  in_suspend = $2 = 0
  crash>

But with your patch, is_per_cpu_range() thinks that ffffffff80420000 is 
a per-cpu symbol value because it is located within the extended range.
So the command incorrectly does this:

  crash> p in_suspend
  PER-CPU DATA TYPE:
    int in_suspend;
  PER-CPU ADDRESSES:
    [0]: ffff81000157a000
    [1]: ffff810001582580
    [2]: ffff81000158ab00
    [3]: ffff810001593080
    [4]: ffff81000159b600
    [5]: ffff8100015a3b80
    [6]: ffff8100015ac100
    [7]: ffff8100015b4680
  crash> 

I also have my doubts about the current kernels which have the single
"pcpu_reserved_chunk_limit" symbol, where its contents are simply added 
to the base kernel's "__per_cpu_end" value.  It seemingly would work OK
with x86_64, because the symbols are very low offset values, and cannot 
be misconstrued for regular symbol address values:  

  crash> sym -l
  0 (D) __per_cpu_start
  0 (V) irq_stack_union
  4000 (V) gdt_page
  5000 (V) exception_stacks
  b000 (V) tlb_vector_offset
  ...
  15740 (V) call_single_queue
  15780 (V) csd_data
  157c0 (V) softnet_data
  15900 (D) __per_cpu_end
  ffffffff81000000 (T) _text
  ffffffff81000000 (T) startup_64
  ffffffff810000b7 (t) ident_complete
  ffffffff81000100 (T) secondary_startup_64
  ...

But, taking a 32-bit x86 2.6.34-rc5 kernel as an example, the per-cpu symbols
are as shown below:

  crash> sym -l
  ...
  c0a90000 (D) __per_cpu_start  <==
  c0a90000 (V) gdt_page
  c0a91000 (V) xen_vcpu
  c0a91020 (V) xen_vcpu_info
  c0a91060 (V) idt_desc
  ...
  c0a995c0 (V) cfd_data
  c0a99600 (V) call_single_queue
  c0a99640 (V) csd_data
  c0a99654 (D) __per_cpu_end    <==
  c0a9a000 (r) .smp_locks
  c0a9a000 (D) __init_end
  c0a9a000 (R) __smp_locks
  c0aa1000 (b) .bss
  c0aa1000 (B) __bss_start
  c0aa1000 (R) __smp_locks_end
  c0aa1000 (b) swapper_pg_pmd
  c0aa2000 (b) swapper_pg_fixmap
  c0aa3000 (B) empty_zero_page
  c0aa4000 (b) level1_ident_pgt
  ... 

So similar to the case of the "legacy" x86_64 example, if the
"pcpu_reserved_chunk_limit" value is simply added to the value 
of the base kernel's "__per_cpu_end" value of c0a99654, it will 
extend into (and overlap) legitimate base kernel virtual address 
space.

And I haven't even bothered to look into how it would affect
the operation of all of the other architectures...

Perhaps the patch can be re-done so that the code can handle it on a
per-module basis without inadvertently affecting everything else?

In any case, there's no way I can accept the patch as it is 
currently designed.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility