Well - I know that my stored proc is segfaulting based on a strace of postgresql. Don't know how that affects trac which isn't using that stored proc... the mystery continues. Either way I didn't get a corefile, and ulimit -a show I have unlimited core file size :( Alex On Fri, Mar 7, 2008 at 11:42 PM, Alex Turner <armtuk@xxxxxxxxx> wrote: > Well - I think it might be that some of my servlets weren't closing > their database connections properly. > > I do have some new evidence though: > > I did an strace of the tomcat processes, and I noticed something that > might be odd, but I'm not really qualified to say. I notice that > every time a socket sends a request to Postgresql it gets some kind of > reply. This is true in all cases EXCEPT when the application crashes. > Here is the segment of the strace right before it throws a wobbly: > > > [pid 4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156 > [pid 4565] bind(156, {sa_family=AF_INET, sin_port=htons(0), > sin_addr=inet_addr("0.0.0.0")}, 16) = 0 > [pid 4565] getsockname(156, {sa_family=AF_INET, > sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 > [pid 4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432), > sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > [pid 4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > [pid 4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"..., > 87, 0) = 87 > [pid 4565] recv(156, > "R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279 > [pid 4565] gettimeofday({1204948966, 386187}, NULL) = 0 > [pid 4565] send(156, "P\0\0\1\35\0\r\n \t\tselect"..., > 334, 0) = 334 > [pid 4565] recv(156, "", 8192, 0) = 0 > [pid 4565] send(156, "X\0\0\0\4", 5, 0) = 5 > [pid 4565] dup2(11, 156) = 156 > [pid 4565] close(156) = 0 > > > Notice that the recv(156,... after sending the query comes back blank > which seems odd given that we just sent a query to the database. > > I'm really in bind with this one. It started happening a couple of > days ago at this point, and all our admin applications are basically > down :(, people can't even log the bugs that this is generating > because the bugtrac (trac) is running on this postgresql and is > throwing errors too. > > I also caught something else that seemed wierd on another trace: > > [pid 3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108 > [pid 3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"..., > 8192, 0) = 387 > [pid 3553] gettimeofday({1204946902, 977641}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977682}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977766}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977902}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977973}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978012}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978053}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978091}, NULL) = 0 > [pid 3553] recv(28, "", 8192, 0) = 0 > [pid 3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe) > [pid 3553] --- SIGPIPE (Broken pipe) @ 0 (0) --- > [pid 3553] rt_sigreturn(0x9) = -1 EPIPE (Broken pipe) > > I couldn't reproduce this though. It just randomly throws a SIGPIPE > after the query. The other wierd thing is that this process also > throws a SIGSEGV at another point. I wasn't expecting tomcat to > crash, so alas I didn't capture a core file. I guess I should set the > system default up. > > Alex > > > > On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: > > On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@xxxxxxxxx> wrote: > > > I didn't. And after the reboot, I still see 8 new sockets stuck in > > > CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem... > > > > Having sockets in CLOSE_WAIT is actually pretty normal > > > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general