On 12/29/10 6:28 AM, Julian v. Bock wrote:
I have the problem that on our servers it happens regularly under a certain workload (several times per minute) that all backend processes get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At 100-200 connections (most of them idle) this causes the system load to skyrocket. I am not really familiar with the code but my wild guess is that the processes spend most of their time waiting for spinlocks. We have reduced the number of connections as much as possible for now but it still makes up for roughly 50% of the total CPU time. Has anyone experienced a similar problem? I can reproduce the issue on a test system with production data but it is not so easy to pinpoint what exactly causes the problem. The queries are basically tsearch2 full text searches over moderately big tables (~35GB). The queries are performed by functions which aggregate data from partitions in temporary tables, cache some data, and perform calculations before returning it to the user. The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores and 16GB of ram. I experimented with shared_buffers between 1GB and 4GB but it doesn't make much of a difference. Disk IO doesn't seem to be an issue here.
This sounds like the exact same problem I had on Postgres 8.3 and 8.4: http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we never confirmed it. http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php Craig -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance