On Friday 19 September 2008, Eljay Love-Jensen wrote: > Hi Mihai Donțu, > > > 1. I apologise. I'm unable to write emails that make clear sense outside > > of my head, when I'm concentrated on something. :) At least, not the > > first time. > > You and me both. :-) > > I meant to say "SSO" not "DSO". > > Where "SSO" means "static shared object" (loaded implicitly by the loader, > ld). > > And "DSO" means "dynamic shared object" (loaded explicitly in the code by > dlopen().) > > > 2. if I pass the flag RTLD_GLOBAL to dlopen() when loading both liba.so > > and libb.so, then I can use dynamic_cast in libb.so for a C++ object > > obtained from liba.so. Otherwise, if I pass only RTLD_LAZY (or RTLD_NOW) > > the dynamic_cast fails across shared objects. I guess I'm looking for a > > way to obtain the same affect as RTLD_GLOBAL (somehow make all the > > symbols in liba.so and libb.so available for global symbol resolution). > > > > This problem appears to be specific to g++3.3.x and earlier. G++ 4.x > > works like a charm (much has changed though). I was wrong. G++ 4.x is affected too. I based my statement on something a friend told me. My own test proved the contrary. > I remember that there were RTTI issues like you describe, but I thought > they were on Solaris. Drat, can't recall the details. > > > I need to look more into how g++ 3.x handles RTTI and why it doesn't > > search the symbol spaces of other loaded shared objects ... > > And it appears that the behavior of the RTLD_GLOBAL is affecting things > adversely. What happens if you load/access the RTTI structure from the > liba.so of the class, explicitly with a dlsym()? Be careful when working > with the mangled name of the RTTI structure of the class. > > I'm betting that the RTTI structure from the object produced in liba.so > which is referenced in libb.so has a NULL pointer to the RTTI in the > object's virtual function table. > > But maybe I'm mistaken, and it's not NULL but rather points to the liba.so > RTTI structure rather than the class's RTTI structure in libb.so. (Since > libb.so was DSO loaded first, and liba.so is SSO loaded in the context of > libb.so, I'd expect liba.so to use the symbols from libb.so deferentially.) > > Could be useful to put in some hack code to peek at the pointers in the > virtual function table, in particular the RTTI pointer (the address of the > pointer itself moreso than the data pointed to... but a dump of the data > pointed to could be an interesting diganostic too). What that layout is, > I'm not sure. Some research may be warranted. I found out what the problem was: each shared object has it's own symbol table with pointers to a __class_type_info class. When the code in b.so tries a dynamic_cast on a C++ object from a.so, the __dynamic_cast() function from libstdc++.so is called from within b.so with the following parameters: __dynamic_cast(object, src_type_info, dst_type_info, 0); Somewhere deep inside libstdc++.so a pointer comparison is made between 'src_type_info', which is in b.so address' space, and the type info associated with 'object', but the 'object's pointer to it's type info is within a.so's address space (event though they describe the same thing) => match failed => dynamic_cast() returns NULL. What dlopen( ...|RTLD_GLOBAL) does, is to put the two symbol tables together and thus make 'src_type_info' and the 'object's type info point to the same __class_type_info object. Since the RTLD_GLOBAL flag is not an option for me, I wrote my own __dynamic_cast() function which searches for the proper 'src_type_info' and 'dst_type_info', and passes them to libstdc++.so's __dynamic_cast(). I have attached a small archive with a minimal poc which should better illustrate my solution. A more complex one can be found in the Qt toolkit (see qobject_cast()). Let there be noted, that this is not a g++ bug nor a ld bug. It's rather a less fortunate consequence of how dynamic linking is done on POSIX based OS-s. :) -- Mihai Donțu
Attachment:
dynamic_cast.tar.gz
Description: application/tgz