Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 11, 2024 at 12:01:13AM -0800, Eduard Zingerman wrote:
> On Sun, 2024-11-10 at 03:38 -0800, Eduard Zingerman wrote:
> 
> [...]
> 
> > Also, it appears there is some bug either in pahole or in libdw's
> > implementation of dwarf_getlocation(). When I try both your patch-set
> > and my variant there is a segfault once in a while:
> > 
> >   $ for i in $(seq 1 100); \
> >     do echo "---> $i"; \
> >        pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/dev/null vmlinux ; \
> >     done
> >   ---> 1
> >   ...
> >   ---> 71
> >   Segmentation fault (core dumped)
> >   ...
> > 
> > The segfault happens only when -j (multiple threads) is passed.
> > If pahole is built with sanitizers
> > (passing -DCMAKE_C_FLAGS="-fsanitize=undefined,address")
> > the stack trace looks as follows:
> 
> Did some additional research for these SEGFAULTs.
> Looks like all we are in trouble.
> 
> # TLDR
> 
> libdw is not supposed to be used in a concurrent context.
> libdw is a part of elfutils package, the configuration flag
> making API thread-safe is documented as experimental:
>   --enable-thread-safety  enable thread safety of libraries EXPERIMENTAL
> At-least Fedora 40 does not ship elfutils built with this flag set.
> This colours all current parallel DWARF decoding questionable.
> 
> # Why segfault happens
> 
> Any references to elfutils source code are for commit [1].
> The dwarf_getlocation() is one of a few libdw APIs that uses memory
> allocation internally. The function dwarf_getlocation.c:__libdw_intern_expression
> iterates over expression encodings in DWARF and allocates
> a set of objects of type `struct loclist` and `Dwarf_Op`.
> Pointers to allocated objects are put to a binary tree for caching,
> see dwarf_getlocation.c:660, the call to eu_tsearch() function.
> The eu_tsearch() is a wrapper around libc tsearch() function.
> This wrapper provides locking for the tree,
> but only if --enable-thread-safety was set during elfutils configuration.
> The SEGFAULT happens inside tsearch() call because binary tree is malformed, e.g.:
> 
>   Thread 8 "pahole" received signal SIGSEGV, Segmentation fault.
>   [Switching to Thread 0x7fffd9c006c0 (LWP 2630074)]
>   0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
>   228	      if (parentp != NULL && RED(DEREFNODEPTR(parentp)))
>   (gdb) bt
>   #0  0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
>   #1  0x00007ffff7c5d466 in __GI___tsearch (...) at tsearch.c:358
>   #2  __GI___tsearch (...) at tsearch.c:290
>   #3  0x000000000048f096 in __interceptor_tsearch ()
>   #4  0x00007ffff7f5c482 in __libdw_intern_expression (...) at dwarf_getlocation.c:660
>   #5  0x00007ffff7f5cf51 in getlocation (...) at dwarf_getlocation.c:678
>   #6  getlocation (...) at dwarf_getlocation.c:667
>   #7  dwarf_getlocation (..._ at dwarf_getlocation.c:708
>   #8  0x00000000005a2ee5 in parameter.new ()
>   #9  0x00000000005a0122 in die.process_function ()
>   #10 0x0000000000597efd in __die__process_tag ()
>   #11 0x0000000000595ad9 in die.process_unit ()
>   #12 0x0000000000595436 in die.process ()
>   #13 0x00000000005b0187 in dwarf_cus.process_cu ()
>   #14 0x00000000005afa38 in dwarf_cus.process_cu_thread ()
>   #15 0x00000000004c7b8d in asan_thread_start(void*) ()
>   #16 0x00007ffff7bda6d7 in start_thread (arg=<optimized out>) at pthread_create.c:447
>   #17 0x00007ffff7c5e60c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
>   (gdb) p parentp
>   $1 = (node *) 0x50300079d2a0
>   (gdb) p *parentp
>   $2 = (node) 0x0
> 
> glibc provides a way to validate binary tree structure.
> For this misc/tsearch.c has to be changed to define DEBUGGING variable.
> (I used glibc 2.39 as provided by source rpm for Fedora 40 for experiments).
> If this is done and custom glibc is used for pahole execution,
> the following error is reported if '-j' flag is present:
> 
>   $ pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/home/eddy/work/tmp/my-new.btf vmlinux 
>   Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
>   Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
>   Aborted (core dumped)
> 
> Executing pahole using a custom-built libdw,
> built with --enable-thread-safety resolves the issue.

could we use libdw__lock around that? but I guess we use it on other
places as well..

jirka

> 
> [1] b2f225d6bff8 ("Consolidate and add files to clean target variables")
>     git://sourceware.org/git/elfutils.git
> 
> 




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux