Hi Tom, I think I've confirmed the fix. Using a dirty disconnect generator, I was able to reliably recreate the problem within about 30-60 seconds. The symptoms were the same as before, however it occurred around SSL_write instead of SSL_read - I assume this was due to the artificial nature of the dirty disconnect (easier for the client to artificially break the connection while waiting/receiving, than sending). The solution you proposed solved it for SSL_write (ran for 30 minutes, no runaway processes), and I think it's safe to assume SSL_read too. So I suggest two additions: ==================================================== rloop: + errno = 0; n = SSL_read(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); switch (err) { case SSL_ERROR_NONE: port->count += n; break; ==================================================== And: ==================================================== wloop: + errno = 0; n = SSL_write(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); switch (err) { case SSL_ERROR_NONE: port->count += n; break; ==================================================== I'm not comfortable running my own compiled version in production (it was rather difficult to get it working), so I'm interested to know when the next release is planned. We can test beta copies on a non-critical load balancing server if necessary. Cheers, -Brendan -----Original Message----- From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] Sent: Sunday, 27 September 2009 2:42 PM To: Brendan Hill Cc: 'Craig Ringer'; pgsql-general@xxxxxxxxxxxxxx Subject: Re: Idle processes chewing up CPU? "Brendan Hill" <brendanh@xxxxxxxx> writes: > Makes sense to me. Seems to be happening rarely now. > I'm not all that familiar with the open source process, is this likely to be > included in the next release version? Can you confirm that that change actually fixes the problem you're seeing? I'm happy to apply it if it does, but I'd like to know that the problem is dealt with. regards, tom lane > -----Original Message----- > From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] > Sent: Monday, 21 September 2009 5:25 AM > To: Brendan Hill > Cc: 'Craig Ringer'; pgsql-general@xxxxxxxxxxxxxx > Subject: Re: Idle processes chewing up CPU? > "Brendan Hill" <brendanh@xxxxxxxx> writes: >> My best interpretation is that an SSL client dirty disconnected while >> running a request. This caused an infinite loop in pq_recvbuf(), calling >> secure_read(), triggering my_sock_read() over and over. Calling >> SSL_get_error() in secure_read() returns 10045 (either connection reset, > or >> WSAEOPNOTSUPP, I'm not sure) - after this, pq_recvbuf() appears to think >> errno=EINTR has occurred, so it immediately tries again. > I wonder if this would be a good idea: > #ifdef USE_SSL > if (port->ssl) > { > int err; > rloop: > + errno = 0; > n = SSL_read(port->ssl, ptr, len); > err = SSL_get_error(port->ssl, n); > switch (err) > { > case SSL_ERROR_NONE: > port->count += n; > break; > It looks to me like the basic issue is that pq_recvbuf is expecting > a relevant value of errno when secure_read returns -1, and there's > some path in the Windows case where errno doesn't get set, and if > it just happens to have been EINTR then we've got a loop. > regards, tom lane > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general