Re: ar (binutils) segfaulting in Rawhide - known bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote:
> Just upgraded a development machine to:
> 
> binutils-2.34.0-10.fc33.x86_64
> gcc-10.1.1-2.fc33.x86_64
> glibc-2.31.9000-21.fc33.x86_64
> 
> and a very simple C compile (non-LTO) is now segfaulting:
> 
> make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol'
> /bin/sh ../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../..    -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE  -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c  -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1
> mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo
> /bin/sh ../../libtool  --tag=CC   --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE   -O0 -g -Wp,-U_FORTIFY_SOURCE  -o libprotocol.la  libprotocol_la-protostrings.lo   
> libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o 
> ../../libtool: line 1734: 2572327 Segmentation fault      (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
> 
> Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000000000000000 in ?? ()
>  binutils-2.34.0-10.fc33.x86_64
> (gdb) bt
> Missing separate debuginfos, use: dnf debuginfo-install#0  0x0000000000000000 in ?? ()
> #1  0x00007f15bd3e03d0 in make_relative_prefix_1.part ()
>    from /lib64/libbfd-2.34.0.20200522.so
> #2  0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv ()
>    from /lib64/libbfd-2.34.0.20200522.so
> #3  0x00007f15bd3401ce in bfd_check_format_matches ()
>    from /lib64/libbfd-2.34.0.20200522.so
> #4  0x00007f15bd340e7a in _bfd_write_archive_contents ()
>    from /lib64/libbfd-2.34.0.20200522.so
> #5  0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so
> #6  0x0000559ee83994b6 in write_archive ()
> #7  0x0000559ee8396ac3 in main ()
> 
> I can't find any BZ for this.  Any ideas what it could be?
After banging my head on the wall for a few hours, I think I see what's happening
here.

So at a high level ar makes a call to lrealpath.  That naturally goes through the
PLT.  The PLT stub loads the value out of the GOT and jumps to it.  The problem
is the entry in the GOT is *zero* when it should be pointing to the resolver.

Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT
entry in libbfd.so looked right.  But by the time the program has hit main, the
GOT entry has been reset to zero.  Naturally that's happening inside the dynamic
linker (as expected, confirmed with a watchpoint).  If you've ever had to debug
ld.so, you'll know it's an insanely painful experience, but it proved fruitful.

The key was finding out that we were not using the libbfd.so linker map to
resolve lrealpath, instead we were using the linker map for the main program (ar
in this case).  So natrually it's time to look a bit more closely at the symbol
table for ar.

The main symbol table for ar it doesn't mention lrealpath.  But that's just a
confusing byproduct of having two symbol tables.  What matters to ld.so is the
*dynamic* symbol table.  And ar has lrealpath in its dynamic symbol table.  And
here's the kicker, it's an absolute symbol with the value 0:

0000000000000000 A lrealpath

A symbol in the main program takes precedence over a symbol in a DSO.  So the
dynamic linker was actually doing the right thing given the input it was
provided.

Now why (*&@#$ does ar have lrealpath as an absolute symbol?  It's got to be
related to the fact that when we link ar we pull in another copy of libiberty.
 In fact, ar links against libiberty twice.  Once via -liberty then again against
libiberty.a (and kindof a 3rd time indirectly via libbfd).  BUt even so that
shouldn't be creating an absolute symbol.  That's just weird.

This smells like a linker bug to me.  Not surprisingly if I force the system to
use ld.gold, then I don't see the bogus absolute symbol and the resultant ar
works just fine.

It's late and I'll dig further over the weekend, but right now this looks like a
linker bug to me.  I may turn off LTO globally or in the various instances of
binutils -- I need to sleep on that.

Jeff

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux