Hello Eljay, First of all, a big thank you! for the very helpful analysis. I want to confirm that both the exception handle and body classes have a vtable and are using virtual destructors. It might also be useful to know that this is a Linux application. > 3. > The typeinfo structs are (probably) *NOT* compared by value. Rather they > are simply compared by address. <-- this compare-by-address "optimization" > is *probably* the underlying cause of all your woes. But not something > that, itself, can be fixed. :-( I have reason to believe this might not be the source of the problem. Let me explain... I've actually got two different main executable programs. One of them is a HTTP client that links with DSO A (Foundation class library) and DSO B (HTTP library). DSO A defines the TEventHandle and EventBody classes. If a TEventHandle<EventBody> is thrown from DSO B, the HTTP client program catches it just fine. Moreover, the typeinfo addresses are different. >From DSO A: 0000000000351d80 V typeinfo for Execution::TEventHandle<Execution::EventBody> >From DSO B: 0000000000327b20 V typeinfo for Execution::TEventHandle<Execution::EventBody> >From HTTP main program: 000000000062a350 V typeinfo for Execution::TEventHandle<Execution::EventBody> ------------------------------------------------------------- I've got another executable program which is a RPC client. This guy also uses DSO A(Foundation class library). And it uses DSO C (RPC library). It's this DSO C which defines the DOCF::UserExceptionBody class that derives from EventBody. If DSO C throws a TEventHandle<EventBody> it falls off the stack and the terminate handler is called. The thing that's driving me nuts is my HTTP program is working, the RPC is not working. The difference is DSO B vs DSO C. There is definitely a difference in the nm output for these symbols in each of these libraries. DSO B (HTTP) - Exception throw / catch works =========== 0000000000327b20 V typeinfo for Execution::TEventHandle<Execution::EventBody> 00000000000f6880 V typeinfo name for Execution::TEventHandle<Execution::EventBody> DSO C (RPC) - Exception throw / catch not working ========== U typeinfo for Execution::TEventHandle<Execution::EventBody> U typeinfo for Execution::EventBody If DSO C is throwing this exception but it's symbol is undefined, could that be the problem? 00000000002e9200 V typeinfo for Execution::TEventHandle<DOCF::UserExceptionBody> 00000000000c7b80 V typeinfo name for Execution::TEventHandle<DOCF::UserExceptionBody> Now why would there be such a surprising difference? Both DSO B and DSO C have dependency on DSO A which defines these exception classes. I have to think that this difference might be related to DSO C defining a derivative class of EventBody which is defined in DSO A. All of this grief is making me wish I could just convert to SSOs and be done with it. However, it's not that simple for me. Believe it or not, both of these applications also happen to use a true plugin, DSO D (SSL library). This DSO D also happens to need to throw exceptions that were defined in DSO A. > If you can make sure that the typeinfo for all > the exception classes are also in your main executable, that will *probably* > fix the problem. (It's a bit of a kluge. But just to work around the > immediate concern, and get you going.) Any advice on how I can do this? This problem was happening even when I was using inline / auto template instantiation vs. explicit. Thanks again for all the help! Dallas On Tue, May 18, 2010 at 9:12 AM, John (Eljay) Love-Jensen <eljay@xxxxxxxxx> wrote: > Hi Dallas, > > I presume this is the case, but worth double checking: the > Execution::EventBody has a virtual destructor, correct? > > Anyway, I think this is the problem you are running into: > > 1. > The exception is being thrown which has RTTI information which refers to the > typeinfo in DSO A. > > 2. > The exception is passed to a catch in your main executable, which refers to > the typeinfo in the main executable. > > 3. > The typeinfo structs are (probably) *NOT* compared by value. Rather they > are simply compared by address. <-- this compare-by-address "optimization" > is *probably* the underlying cause of all your woes. But not something > that, itself, can be fixed. :-( > > 4. > Since the addresses do not match, the catch does not trigger, and the > exception continues propagating. > > 5. > The stack unwind eventually falls off the top of the call stack and > triggering a terminate(). > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > How to fix it, fast? > > First, the templates muddy the waters. But I do not expect that the > templates are part of the problem. The template inherits from a regular > base class. And that regular base class has a virtual destructor [right?]. > So it's all good. > > Second, is the throw of a DOCF::UserExceptionBody, or of a > Execution::TEventHandle<DOCF::UserExceptionBody>? > > Third, the problem is *probably* in the area of a RTTI, and how the loader > resolves the vague linkage. If you can make sure that the typeinfo for all > the exception classes are also in your main executable, that will *probably* > fix the problem. (It's a bit of a kluge. But just to work around the > immediate concern, and get you going.) > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > The underlying problem? > > I *suspect* that your loader is resolving vague linkage from DSO A to main. > And from DOS B to main. But not from DSO B to DSO A. Which results in two > of the RTTI typeinfo structs being "live" which are supposed to be the same > thing. Which is a kind of ODR violation. > > Since the RTTI compares simple addresses of the RTTI struct for "sameness", > rather than compare-by-value, the catch in main fails to identify the > derived DOCF::UserExceptionBody as being the-same-as the > Execution::EventBody when it attempts to do the dynamic_cast. > > And why do I keep harping on "virtual destructor"? Because the RTTI > information is referenced from a typeinfo pointer in the virtual function > table stuct, under the covers. If there is no virtual functional table, > then dynamic_cast and RTTI resolution for throw/catch will fail. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > How to fix it, proper? > > Not sure. :-( You are now diving into the deep water of the intricacies of > the run time loader (and perhaps linker) and how it does symbol resolution. > > And the loader intricacies from Darwin (OS X) differs from Windows which > differs from Linux which differs from Solaris which differs from AIX. > > There may be relevant compiler flags, or linker flags, which will affect the > behavior. Is DSO B dependent on DSO A? If not, making DSO B dependent on > DSO A could affect load order at load time. I'm not sure how to do that > with DSO's; with SSO's it's easy, since you specify those at link time. But > with a DSO, it's more of a plug-in situation. > > Visibility may be an issue (but doesn't appear to be, since those were > marked with 'V'). > > Since you mention that these are DSO A and DSO B, rather than SSO A and SSO > B, you may be affected by the dlopen (or whatever the AIX equivalent is) > flags which affect symbol resolution. And also affected by the order in > which the DSO A and DSO B are loaded. > > HTH, > --Eljay > >