Michael, Many thanks for your response; it is much appreciated. My responses are embedded below: On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote: > On Fri, Sep 16, 2005 at 02:16:29PM -0700, Marc Munro wrote: > > It is Postgres 7.3.6. The client is a multi-threaded C++ client. The > > breakage was that one group of connections simply stopped. Others > > contined without problem. It is not clear exactly what was going on. > > How did the connections "stop"? Were the connections broken, causing > queries to fail? Or did queries block and never return? Or something > else? What was happening that shouldn't happen, or what wasn't > happening that should happen? From the server side, there were simply connections (1 or 2) that appeared idle. From the client side it looked like a query had been initiated but the client thread was stuck in a library call (as near as we can tell). This, vague though it is, is as much as I know right now. We were unable to do much debugging as it is a production system and the priority was to get it back up. > If the connections were still active but not returning, did you do > a process trace on the connection's postmaster or attach a debugger > to it to see what it was doing? No, time pressure prevented this. > Could the timing of the problem have been coincidence? Have you > ever seen the problem without a reload? How often do you see the > problem after a reload? Do you know for certain that the application > was working immediately before the reload and not working immediately > after it? It *could* be coincidence, but the problem began within 5 seconds of the reload. Coincidence is unlikely. > What operating system are you using? Linux 2.4.20 smp i686 > > > Nothing in our application logs gives us any clue to this. > > What about the postmaster logs? Ah, now there's another story. Unavailable I'm afraid. Resolving that is also on my priority list. > > As for reproducibility, it has hapenned before in test environments when > > we have bounced the datanase. This is not too shocking as I would > > expect the client to notice this :-) It is a little more shocking when > > it's a reload. Or maybe I have simply misunderstood what reload does. > > Can you reproduce the problem with a reload? A stop and start will > terminate client connections, but a reload shouldn't. This is not currently seen as a priority (the work-around of "don't do that" is seen as sufficient). I'm simply hoping to get someone to say for sure that the client app should not be able to tell that a reload has happened. At that point I may be able to raise the priority of this issue. I would certainly like to do more investigation. If postgresql hackers are interested in this strange event (please tell me for sure that it *is* strange) that may also help me to get the necessary resources to run more tests. Thanks again. __ Marc Munro
Attachment:
signature.asc
Description: This is a digitally signed message part