>>> On Mon, Apr 3, 2006 at 11:52 am, in message <14779.1144083156@xxxxxxxxxxxxx>, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes: >> Is there any way to tweak this in favor of more accurate information, >> even if has a performance cost? We're finding that during normal >> operations we're not seeing most connections added to the >> pg_stat_activity table. We would like to be able to count on accurate >> information there. > > That's basically a non- starter because of the delay in reporting from > the stats collector process (ie, even if the information was "completely > accurate" it'd still be stale by the time that your code gets its hands > on it). I think you'd be talking about a complete redesign of the stats > subsystem to be able to use it that way. We want this for our monitoring software, to raise an alert when the connection pool diverges from its nominal configuration beyond prescribed limits or in excess of a prescribed duration. What we're looking for is not necessarily a table which is accurate immediately, but one which won't entirely miss a connection. Even then, if it only misbehaves under extreme load, that would be OK; such extreme usage might be worthy of note in and of itself. Since we have converted to PostgreSQL we have not had this monitoring, and folks are nervous that we will not detect a struggling middle tier before it fails. (Not something that happens often, but we really hate having users tell us that something is broken, versus spotting the impending failure and correcting it before it fails.) > Having said that, though, I'd be pretty surprised if the stats subsystem > was dropping more than a small fraction of messages --- I would think > that could only occur under very heavy load, and if that's your normal > operating state then it's time to upgrade your hardware ;- ). We have a pair of database servers for our transaction repository. Each has four Xeon processors. One of these is Windows, one is Linux. On the Windows machine, I see 10% CPU utilization. On the Linux machine I see a load average of 0.30. The Linux machine seems to be very reliable about showing the connections. The Windows machine, when I refresh a 20-connection pool, I either get no connections showing, or only a few. > Maybe you > should investigate a bit more closely to find out why it's dropping so > much. It is probably related to something we've been seeing in the PostgreSQL logs on the Windows servers: [2006-04-03 08:28:25.990 ] 2072 FATAL: could not read from statistics collector pipe: No error [2006-04-03 08:28:26.068 ] 2012 LOG: statistics collector process (PID 3268) was terminated by signal 1 We're going to patch to try to capture more info from WinSock. In src/port/pipe.c we plan to add before return ret in piperead(): if (ret == SOCKET_ERROR) { ereport(LOG, (errmsg_internal("SOCKET ERROR: %ui", WSAGetLastError()))); } I hope to post more info, and possibly a patch, tomorrow. -Kevin