Re: pl/pgsql function spikes CPU 100%

Jeff Frost <jeff@xxxxxxxxxxxxxxxxxxxxxx> · Fri, 16 Mar 2007 08:24:08 -0700 (PDT)

On Fri, 16 Mar 2007, Tom Lane wrote:

Jeff Frost <jeff@xxxxxxxxxxxxxxxxxxxxxx> writes:
... Interestingly, when you
strace the backend, it doesn't appear to be doing too much...here's some
sample output:

select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(3932217, 0x7fbfffd150, 1)         = 0
semop(3932217, 0x7fbfffd150, 1)         = 0
semop(3932217, 0x7fbfffd150, 1)         = 0
semop(3932217, 0x7fbfffd150, 1)         = 0
semop(3932217, 0x7fbfffd150, 1)         = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(3997755, 0x7fbfffd170, 1)         = 0
semop(3932217, 0x7fbfffd150, 1)         = 0

This looks suspiciously like the sort of trace we saw in the various
"context swap storm" threads.  The test cases for those generally
involved really tight indexscan loops, ie, the backends were spending
all their time trying to access shared buffers.  If you haven't changed
the function or the data, then I concur with the nearby worry about
autovacuuming (large buildup of dead tuples could result in this symptom).
Or maybe you've got an old open transaction that is blocking cleanup?

Tom,

I doubt it's a problem with autovacuum as the data in this server was just 
loaded an hour before the strace above was taken, so there should have been 
almost no dead tuples, especially since these tables are nearly write once. 
I.e. they get written to once, then the populate function updates them, then 
months later they get archived off.

There did not appear to be high context switch activity nor any IO wait to 
mention during the time I was watching the postmaster.  If it's worth 
mentioning, it's running CentOS 4.4 with the kernel-2.6.9-34.EL kernel.

--
Jeff Frost, Owner 	<jeff@xxxxxxxxxxxxxxxxxxxxxx>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954