Unfortunately, I've tried your advice, and I think that we're still in a CPU-bound situation even after following the re-indexing and re-vacuuming.
Fortunately, though, I've just learned about a poor-man's profiler under Linux named pstack, and it's telling me that the vacuum process is spending most of its time in the enough_space() function in vacuum.c (somewhere around line 2574 in the most recent version in CVS I believe).
When I run pstack, I almost always see the execution stack like this:
% pstack 12158
#0 0x0000000000527478 in enough_space ()
#1 0x0000000000528b60 in repair_frag ()
#2 0x000000000052b130 in full_vacuum_rel ()
#3 0x000000000052bab8 in vacuum_rel ()
#4 0x000000000052bf12 in vacuum ()
#5 0x00000000005d69b9 in PortalRunUtility ()
#6 0x00000000005d79d5 in PortalRunMulti ()
#7 0x00000000005d80e5 in PortalRun ()
#8 0x00000000005d3e0b in exec_simple_query ()
#9 0x00000000005d4a6b in PostgresMain ()
#10 0x00000000005abe0b in ServerLoop ()
#11 0x00000000005ac9bd in PostmasterMain ()
#12 0x000000000055ae7e in main ()
Every now and then, I can see a call to ReadBufferExtended from repair_frag(), so the CPU-intensive part does eventually find what it's looking for, but I'm hopeful that this could somehow be done more efficiently and this process could be limited by I/O only.
So...I guess my question is this: is it a known issue that this code path can get very cpu-bound for large numbers of blob garbage? Looking at the code for enough_space(), it is almost hard to believe that a lot of time is being spent here, but perhaps it's just getting called a LOT from repair_frag(). I'm no stranger to hacking code if I have to....thanks!
Sam
On Fri, Jan 22, 2010 at 4:06 PM, PG User 2010 <pguser2010@xxxxxxxxx> wrote:
Hi Tom,
As always, your insight is VERY helpful. We'll try your suggestions and see if that helps things out... Thanks!
SamOn Fri, Jan 22, 2010 at 4:01 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
PG User 2010 <pguser2010@xxxxxxxxx> writes:The only part of VAC FULL that seems like it could be CPU-bound is index
> 1) is there any easy way to fiddle with the vacuum process so that it is not
> CPU bound and doing very little I/O? Why would vacuum full be CPU bound
> anyway???
cleanup. If the table is sufficiently bloated with dead tuples, that
could take awhile. It might be useful to try this:
1. REINDEX TABLE pg_largeobject;
2. VACUUM pg_largeobject;
3. VACUUM FULL pg_largeobject;
I have never tried this in a serious bloat situation, but in principle
I think it should improve matters. The idea is to get rid of as many dead
index and heap entries as you can before letting VAC FULL loose on it, and
also do as much of the work as possible with a less-than-exclusive lock.
Don't forget that large maintenance_work_mem will help the first two
steps, as long as you don't set it so high as to drive the machine into
swapping.
NO. Doing that will in fact make things worse --- a failed VAC FULL
> 2) is it possible to interrupt VACUUM FULL, then re-start it later on and
> have it pick up where it was working before?
results in even more dead entries to be cleaned up.
regards, tom lane