On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote: > Just upgraded a development machine to: > > binutils-2.34.0-10.fc33.x86_64 > gcc-10.1.1-2.fc33.x86_64 > glibc-2.31.9000-21.fc33.x86_64 > > and a very simple C compile (non-LTO) is now segfaulting: > > make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' > /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1 > mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo > /bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo > libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o > ../../libtool: line 1734: 2572327 Segmentation fault (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o > > Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000000000000 in ?? () > binutils-2.34.0-10.fc33.x86_64 > (gdb) bt > Missing separate debuginfos, use: dnf debuginfo-install#0 0x0000000000000000 in ?? () > #1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () > from /lib64/libbfd-2.34.0.20200522.so > #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () > from /lib64/libbfd-2.34.0.20200522.so > #3 0x00007f15bd3401ce in bfd_check_format_matches () > from /lib64/libbfd-2.34.0.20200522.so > #4 0x00007f15bd340e7a in _bfd_write_archive_contents () > from /lib64/libbfd-2.34.0.20200522.so > #5 0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so > #6 0x0000559ee83994b6 in write_archive () > #7 0x0000559ee8396ac3 in main () > > I can't find any BZ for this. Any ideas what it could be? After banging my head on the wall for a few hours, I think I see what's happening here. So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver. Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful. The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar. The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0: 0000000000000000 A lrealpath A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided. Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird. This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine. It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that. Jeff _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx