Re: Curious run-away index build on upgrade to 8.1.3

Jerry Sievers <jerry@xxxxxxxxxxxxxxxx> · 16 Mar 2006 11:00:10 -0500

Hi Tom, thanks for responding on this. More below;

Tom Lane <tgl@xxxxxxxxxxxxx> writes:

> Jerry Sievers <jerry@xxxxxxxxxxxxxxxx> writes:
> > What happened was that for a couple minutes the CPU load would
> > steadily increase and disk activity decrease at the same time.  Before
> > long, one CPU is 100% busy and we let this continue for 2 hours, a
> > 100x longer than this index usually takes to build.  Disk IO dropped
> > to nothing and remained so.
> 
> Hm, we were just doing some work in this area a week or two ago, and
> 8.2 should be materially faster than current releases ... but offhand
> I don't know why 8.1 would be any worse than earlier versions.  Looking
> in the CVS history shows that the sort logic didn't change at all
> between 8.0 and 8.1.  Are you sure index build on this index really
> performs differently than it did in 8.0?  What platform is this on?

Ok, I just did re-check this on an 8.0.3 install and got proper
results.  That is; the index builds in about 1 minute.

Platform is Solaris 2.9

FWIW, there is at least one difference in compile time options and
that is; on our 8.1.3 installs, we are now using
'enable-thread-safety' as Slony asks for this.  Not sure if this may
be related to the problem though.

These are on machines with a Gig of RAM.

On 8.1.3 the index will not build with maintenance_work_mem set to our
v8.0 setting of 64k.  It will however build with it at 32k and even
builds fine with the default value for maintenance_work_mem whcih
which is 16k for this machine.

Were the index to now build in say 10 minutes of actual work, we'd be
looking at a performance problem.

On the contrary, the failure mode is a complete run-away process doing
nothing productive.

> 
> > Worse is that the backend that was spinning would not respond to a
> > cancel nor SIGTERM.  Stopping this activity required a -m immediate
> > shutdown of Postgres.
> 
> Yeah, we also noticed last week that some of the major loops in btree
> index build were free of any CHECK_FOR_INTERRUPTS calls :-(.  This is
> fixed for 8.1.4.

Great!

The problem we saw suggests the backend being in an endless loop of
some kind.

> > If it would be of interest to someone that I truss one of the spinning
> > processes, I can redo this in an R&D setting and submit the results.
> 
> truss probably wouldn't show anything interesting; a gprof or oprofile
> profile could be useful though.

Ok I did a bit of reading on gprof and note where it says the program
has to call exit in order for the profiling data file to be written.

My limited experience with the problem here is that we had to stop it
with "Sig Niner".  Tried standard cancel and also OS level SIGTERM, to
no avail.

Any idea if there was a signal that this might have responded to shy
of -9?

Great big thanks.

> 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

-- 
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile	http://www.JerrySievers.com/