Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2024-11-10 at 03:38 -0800, Eduard Zingerman wrote:

[...]

> Also, it appears there is some bug either in pahole or in libdw's
> implementation of dwarf_getlocation(). When I try both your patch-set
> and my variant there is a segfault once in a while:
> 
>   $ for i in $(seq 1 100); \
>     do echo "---> $i"; \
>        pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/dev/null vmlinux ; \
>     done
>   ---> 1
>   ...
>   ---> 71
>   Segmentation fault (core dumped)
>   ...
> 
> The segfault happens only when -j (multiple threads) is passed.
> If pahole is built with sanitizers
> (passing -DCMAKE_C_FLAGS="-fsanitize=undefined,address")
> the stack trace looks as follows:

Did some additional research for these SEGFAULTs.
Looks like all we are in trouble.

# TLDR

libdw is not supposed to be used in a concurrent context.
libdw is a part of elfutils package, the configuration flag
making API thread-safe is documented as experimental:
  --enable-thread-safety  enable thread safety of libraries EXPERIMENTAL
At-least Fedora 40 does not ship elfutils built with this flag set.
This colours all current parallel DWARF decoding questionable.

# Why segfault happens

Any references to elfutils source code are for commit [1].
The dwarf_getlocation() is one of a few libdw APIs that uses memory
allocation internally. The function dwarf_getlocation.c:__libdw_intern_expression
iterates over expression encodings in DWARF and allocates
a set of objects of type `struct loclist` and `Dwarf_Op`.
Pointers to allocated objects are put to a binary tree for caching,
see dwarf_getlocation.c:660, the call to eu_tsearch() function.
The eu_tsearch() is a wrapper around libc tsearch() function.
This wrapper provides locking for the tree,
but only if --enable-thread-safety was set during elfutils configuration.
The SEGFAULT happens inside tsearch() call because binary tree is malformed, e.g.:

  Thread 8 "pahole" received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7fffd9c006c0 (LWP 2630074)]
  0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
  228	      if (parentp != NULL && RED(DEREFNODEPTR(parentp)))
  (gdb) bt
  #0  0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
  #1  0x00007ffff7c5d466 in __GI___tsearch (...) at tsearch.c:358
  #2  __GI___tsearch (...) at tsearch.c:290
  #3  0x000000000048f096 in __interceptor_tsearch ()
  #4  0x00007ffff7f5c482 in __libdw_intern_expression (...) at dwarf_getlocation.c:660
  #5  0x00007ffff7f5cf51 in getlocation (...) at dwarf_getlocation.c:678
  #6  getlocation (...) at dwarf_getlocation.c:667
  #7  dwarf_getlocation (..._ at dwarf_getlocation.c:708
  #8  0x00000000005a2ee5 in parameter.new ()
  #9  0x00000000005a0122 in die.process_function ()
  #10 0x0000000000597efd in __die__process_tag ()
  #11 0x0000000000595ad9 in die.process_unit ()
  #12 0x0000000000595436 in die.process ()
  #13 0x00000000005b0187 in dwarf_cus.process_cu ()
  #14 0x00000000005afa38 in dwarf_cus.process_cu_thread ()
  #15 0x00000000004c7b8d in asan_thread_start(void*) ()
  #16 0x00007ffff7bda6d7 in start_thread (arg=<optimized out>) at pthread_create.c:447
  #17 0x00007ffff7c5e60c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
  (gdb) p parentp
  $1 = (node *) 0x50300079d2a0
  (gdb) p *parentp
  $2 = (node) 0x0

glibc provides a way to validate binary tree structure.
For this misc/tsearch.c has to be changed to define DEBUGGING variable.
(I used glibc 2.39 as provided by source rpm for Fedora 40 for experiments).
If this is done and custom glibc is used for pahole execution,
the following error is reported if '-j' flag is present:

  $ pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/home/eddy/work/tmp/my-new.btf vmlinux 
  Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
  Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
  Aborted (core dumped)

Executing pahole using a custom-built libdw,
built with --enable-thread-safety resolves the issue.

[1] b2f225d6bff8 ("Consolidate and add files to clean target variables")
    git://sourceware.org/git/elfutils.git






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux