On 2 May 2017 at 16:04, Guilherme Quentel Melo <gqmelo@xxxxxxxxx> wrote: > On 13 April 2017 at 15:06, Jonathan Wakely <jwakely.gcc@xxxxxxxxx> wrote: >> On 13 April 2017 at 14:26, Guilherme Quentel Melo wrote: >>> Thanks for replying Jonathan >>> >>> >>>> It's not supported to mix C++11 code compiled with 4.x and GCC 5+, in >>>> any way, whether linking dynamically or statically. >>> >>> OK, but this is true even when the API is C? In this case no c++ structure >>> is ever passed to mesa. If mesa was compiled with the new ABI, I should >>> still be fine, right? >> >> Right. >> >>>> If you're only using C++98 (and of course only using the old COW >>>> std::string in the code compiled with GCC 5+) >>> >>> Yeah, my gcc 5.x has _GLIBCXX_USE_CXX11_ABI=0 in the specs >>> >>>> This of course assumes both GCC versions are configured to be >>>> compatible, i.e. you're not using --enable-fully-dynamic-string >>> >>> I'm not using many configure options, only >>> --enable-version-specific-runtime-libs --disable-multilib >>> >>> >>> So if this should work I will try to investigate it further, but I'm not sure >>> what else I can do. >>> gdb did not help much because if I recompile mesa without optimizations >>> the crash does not happen. >>> >>> Actually disabling only inline optimization also makes the crash go away. >>> Given that all invalid free stacks shown by valgrind contain inline functions >>> from basic_string.h does that ring you any bells? >>> >>> Any other tips for debugging this? >> >> I'm not sure what to check. If the symbols are equivalent then it >> shouldn't matter whether a given symbol is inlined using the GCC 4.8.5 >> code or comes from the 5.4.0 shared library. But apparently it does, >> so either the new library is not backwards compatible, or something >> else is going on. > > > So I finally got some time to further investigate this issue and I > found (hopefully) > the problem. In case someone find similar problem this is what I've done: > > - Rebuilt stock gcc 4.8.5 and 5.1.0 on CentOS 6 without stripping binaries > - Created a dummy FooEngineBuilder on llvm/ExecutionEngine/ExecutionEngine.h > - Rebuilt both mesa and llvm-mesa-private on CentOS 7 with gcc 4.8.5 and debug > symbols > > FooEngineBuilder is just a class with a std::string member and two methods to > set the string: > > class FooEngineBuilder { > private: > std::string MCPU; > public: > FooEngineBuilder &setMCPUFromHeader() { > std::string mymcpu; > mymcpu = "my_mcpu"; > MCPU.assign(mymcpu.begin(), mymcpu.end()); > } > FooEngineBuilder &setMCPUFromSource(); > }; > > Using this class on mesa this crashes: > > FooEngineBuilder foo_builder; > foo_builder.setMCPUFromHeader(); > > and this does not: > > FooEngineBuilder foo_builder; > foo_builder.setMCPUFromSource(); > > What happens is that MCPU is an empty string pointing to > std::string::_Rep::_S_empty_rep_storage defined on the static libstdc++ > (gcc 4.8.5). When assigning MCPU from the header, the _M_dispose method > from the dynamic library (gcc 5.1.0) is called. > > _M_dispose only destroy the string if it's not a reference to > std::string::_Rep::_S_empty_rep_storage: > > if (__builtin_expect(this != &_S_empty_rep(), false)) > > The problem is that *this* is pointing to a different > std::string::_Rep::_S_empty_rep_storage than &_S_empty_rep(), making > _M_dispose try to delete a static std::string member. > > In summary the problem is that static variables are being defined twice, > exactly why STB_GNU_UNIQUE was created: > > https://www.redhat.com/archives/posix-c++-wg/2009-August/msg00002.html > > The llvm library is correctly defining the symbols as unique: > > $ objdump -C -T /usr/lib64/libLLVM-3.8-mesa.so | grep _S_empty_rep_st> > 000000000405be20 u DO .bss 0000000000000020 Base > std::string::_Rep::_S_empty_rep_storage > 000000000405bde0 u DO .bss 0000000000000020 Base > std::basic_string<wchar_t, std::char_traits<wchar_t>, > std::allocator<wchar_t> >::_Rep::_S_empty_rep_storage > > But the libstdc++ compiled on CentOS 6 is not: > > $ objdump -C -T $LIBSTDCXX5 | grep _S_empty_rep_storage > 000000000038c300 g DO .bss 0000000000000020 GLIBCXX_3.4 > std::string::_Rep::_S_empty_rep_storage > 000000000038c320 g DO .bss 0000000000000020 GLIBCXX_3.4 > std::basic_string<wchar_t, std::char_traits<wchar_t>, > std::allocator<wchar_t> >::_Rep::_S_empty_rep_storage > > So in conclusion when building gcc I need to make sure that libstdc++.so is > defining STB_GNU_UNIQUE symbols. > > Maybe this should be mentioned on some gcc/libstdc++ docs related to binary > compatibility? Hi Jonathan, Me again. So I thought I had solved the problem making sure that my libstdc++ was using STB_GNU_UNIQUE But now I'm facing another crash with a invalid pointer being freed. This time related to std::locale. The crash happens on locale::_Impl::_M_install_facet. After debugging I have no idea what would be the right behaviour. I attached the whole gdb output but here are some highlights (I ommitted some output so the lines don't break). The locale stuff is first initialized on libstdc++.so.6 (gcc 5.4.0). This is part of the stack with a breakpoint on "if (__index > _M_facets_size - 1)" : #0 std::locale::_Impl::_M_install_facet at locale.cc:321 #1 in std::locale::_Impl::_M_init_facet<... > at locale_classes.h:602 #2 in std::locale::_Impl::_Impl at locale_init.cc:479 #3 in std::locale::_S_initialize_once () at locale_init.cc:307 #4 in pthread_once () from /lib64/libpthread.so.0 #5 in __gthread_once at gthr-default.h:699 #6 in std::locale::_S_initialize () at locale_init.cc:316 #7 in std::locale::locale at locale_init.cc:250 All of this happens on libstdc++.so.6. Adding a breakpoint on _M_install_facet to print some info: b gcc-5.4.0/libstdc++-v3/src/c++98/locale.cc:321 command 1 print __index print _M_facets_size continue end shows _M_facets_size = 46 and __index goes from 0 to 29. But at some point when boost::lexical_cast function is used, std::locale from libLLVM-3.8-mesa.so (gcc 4.8.5) is used, making locale stuff being initialized again: #0 std::locale::_Impl::_M_install_facet at locale.cc:319 #1 in std::locale::_Impl::_M_init_facet<... > at locale_classes.h:564 #2 in std::locale::_Impl::_Impl at locale_init.cc:397 #3 in std::locale::_S_initialize_once at locale_init.cc:267 #4 in pthread_once () from /lib64/libpthread.so.0 #5 in __gthread_once gthr-default.h:699 #6 in std::locale::_S_initialize () at locale_init.cc:276 #7 in std::locale::locale at locale_init.cc:210 #8 in boost::detail::lcast_put_unsigned<std::char_traits<char>, unsigned long, char>::convert lcast_unsigned_converters.hpp:95 All of the above C++ execution happens on libLLVM-3.8-mesa.so. Adding a breakpoint like the previous one: b gcc-4.8.5/libstdc++-v3/src/c++98/locale.cc:319 command 2 print __index print _M_facets_size continue end shows _M_facets_size = 28 and __index going from 0 to 1 and then suddenly jumping to 30. That's when the crash happens. It allocates a new facet vector and try to delete the old one. locale.cc:352 causes the crash: delete [] __oldf; So it seems a lot of things are going wrong: 1 - Should it be safe to call _S_initialize_once on both libraries? 2 - Is the "if (__index > _M_facets_size - 1)" branch executed on normal circumstances? 3 - On another test with only a main.cpp and linking only to libstdc++.so I tried to force the code inside this "if" to be executed, doing "set __index = _M_facets_size" on gdb and the result is the same crash. Should delete [] __oldf even work? PS.: Unfortunately I couldn't come up with a simple example to reproduce. All examples I tried were only executing locale code from libstdc++.so.
Attachment:
gdb_output
Description: Binary data
Attachment:
trace_locale.gdb
Description: Binary data