Search Postgresql Archives

Re: Idle processes chewing up CPU?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Craig, I've debugged the runaway process, though I'm not sure of the
solution yet.

My best interpretation is that an SSL client dirty disconnected while
running a request. This caused an infinite loop in pq_recvbuf(), calling
secure_read(), triggering my_sock_read() over and over. Calling
SSL_get_error() in secure_read() returns 10045 (either connection reset, or
WSAEOPNOTSUPP, I'm not sure) - after this, pq_recvbuf() appears to think
errno=EINTR has occurred, so it immediately tries again.

I'd appreciate any further interpretation and advice on this, how to
diagnose it further, or whether it's been addressed in a later version. Hope
you appreciate I've erred on the side of too much detail in the following
analysis.

As a side note, I had the postgres.exe process in debug mode for 2 hours,
and experienced no server downtime at all, which is good to know.

Regards,
-Brendan

=========================================
Main call stack:
=========================================

>	postgres.exe!pq_recvbuf()  Line 746 + 0x1a bytes
 	postgres.exe!pq_getbyte()  Line 795 + 0x5 bytes
 	postgres.exe!SocketBackend(StringInfoData * inBuf=0x00000000)  Line
317 + 0x5 bytes
 	postgres.exe!PostgresMain(int argc=4, char * * argv=0x0155db48,
const char * username=0x01064e20)  Line 3595 + 0x23 bytes
 	postgres.exe!BackendRun(Port * port=0x00bdfd40)  Line 3208
 	postgres.exe!SubPostmasterMain(int argc=3, char * * argv=0x01063f28)
Line 3674 + 0x8 bytes
 	postgres.exe!main(int argc=3, char * * argv=0x01063f28)  Line 165 +
0x7 bytes
 	postgres.exe!__tmainCRTStartup()  Line 597 + 0x17 bytes
 	kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes

Relevant variables:

PqRecvLength: 0

PqRecvPointer: 0

PqRecvBuffer: Q...,SELECT COUNT(*) FROM.   (SELECT tgargs from pg_trigger
tr.      LEFT JOIN p-g_depend dep ON dep.objid=tr.oid AND deptype = 'i'.
LEFT JOIN pg_constraint co ON refobjid = co.oid AND contype = 'f'.     WHERE
tgisconstraint AND co.oid IS NULL.     GROUP BY tgargs.    HAVING count(1) =
3) AS foo.M pg_namespace nsp.  LEFT OUTER JOIN pg_description des ON
des.objoid=nsp.oid. WHERE NOT ((nspname = 'pg_catalog' and (SELECT count(*)
FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'pgagent' and (SELECT count(*) FROM pg_class WHERE relname =
'pga_job' AND relnamespace = nsp.oid) > 0) OR.(nspname =
'information_schema' and (SELECT count(*) FROM pg_class WHERE relname =
'tables' AND relnamespace = nsp.oid) >0x00789E10  20 30 29 20 4f 52 0a 28 6e
73 70 6e 61 6d 65 20   0) OR.(nspname = 'dbo' and (SELECT count(*) FROM
pg_class WHERE relname = 'systables' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'sys' and (SELECT count(*) FROM pg_class WHERE relname =
'all_tables' AND relnamespace = nsp.oid) > 0)). ORDER BY 1, nspname

MyProcPort:	0x00bdfd40
	sock	2044
	proto	196608
	laddr	{addr={...} salen=16 }
	raddr	{addr={...} salen=16 }
	remote_host	0x0106acf0 "<removed>"
	remote_port	0x010694a0 "3307"
	canAcceptConnections	CAC_OK
	database_name	0x01064df8 "<removed>"
	user_name	0x01064e20 "<removed>"
	cmdline_options	0x00000000 <Bad Ptr>
	guc_options	0x00000000 {type=??? length=??? head=??? ...}
	auth_method	uaMD5	
	auth_arg	0x00000000 <Bad Ptr>
	md5Salt	0x00bdfe7c "<removed>"
	cryptSalt	0x00bdfe80 "UA"
	SessionStartTime	303975637670000
	default_keepalives_idle	0
	default_keepalives_interval	0
	default_keepalives_count	0
	keepalives_idle	0
	keepalives_interval	0
	keepalives_count	0
	gss	0x010694e0 {outbuf={...} cred=0x00000000 ctx=0x00000000 ...}
	ssl	0x003fbb78 {version=769 type=8192 method=0x1002b298 ...}
	peer	0x00000000 {cert_info=??? sig_alg=??? signature=??? ...}
	peer_dn	0x00bdfeb4 "(anonymous)"
	peer_cn	0x00bdff35 "(anonymous)"
	count	5287	unsigned long

pq_recvbuf() calls "secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
PQ_BUFFER_SIZE - PqRecvLength);", which returns -1. I've inferred from
control flow that errno=EINTR (can't debug the value unfortunately), so it
continues the loop.


=========================================
Within secure_read()
=========================================

SSL_get_error() returns 10054 (connection reset or WSAEOPNOTSUPP, I'm not
sure)

The switch() command jumps to SSL_ERROR_SYSCALL, then immediately jumps to
SSL_ERROR_NONE !?, and port->count NOT incremented - this was strange,
suggesting a source code mismatch or debugger problem perhaps.

secure_read() returns -1, popping back to pq_recvbuf()


=========================================
Within my_sock_read()
=========================================

secure_read() causes a call to my_sock_read(), which has a separate stack
trace (waiting on single object, I assume):

>	postgres.exe!my_sock_read(bio_st * h=0x003fbcf0, char *
buf=0x0165bc98, int size=5)  Line 443	C
 	libeay32.dll!0026a78f() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for
libeay32.dll]	
 	mswsock.dll!_WSPEventSelect@16()  + 0x92 bytes	
 	ffffffff()	

Within prepare_for_client_read(): DoingCommandRead = 1

Within EnableNotifyInterrupt(): IsTransactionOrTransactionBlock() = 0,
notifyInterruptOccurred = 0

Within EnableCatchupInterrupt(): catchupInterruptOccurred = 0

After "res = std_sock_read(h, buf, size);" (can't debug result,
unfortunately), (buf, size, h) hadn't change at all

Within client_read_ended(): DoingCommandRead = 1

Exiting my_sock_read() and wandering through some assembly, it finds itself
back into socket_read(), then back in pg_recvbuf(), and the loop continues.



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux