Re: TPC-H Scaling Factors X PostgreSQL Cluster Command

Heikki Linnakangas <heikki@xxxxxxxxxxxxxxxx> · Tue, 24 Apr 2007 09:52:38 +0100

Greg Smith wrote:
On Sat, 21 Apr 2007, Nelson Kotowski wrote:

I identified that the cluster command over the lineitem table (cluster 
idx_lineitem on lineitem) is the responsible. I got to this conclusion 
because when i run it in the 1GB and 2GB database i am able to 
complete this script in 10 and 30 minutes each. But when i run this 
command over the 5GB database, it simply seems as it won't end.

Have you looked in the database log files for messages?  Unless you 
changed some other parameters from the defaults that you didn't mention, 
I'd expect you've got a constant series of "checkpoint occuring too 
frequently" errors in there, which would be a huge slowdown on your 
index rebuild.  Slowdowns from checkpoints would get worse with an 
increase of shared_buffers, as you report.

Index builds don't write WAL, unless archive_command has been set. A 
higher shared_buffers setting can hurt index build performance, but for 
a different reason: the memory spent on shared_buffers can't be used for 
sorting and caching the sort tapes.

The default setting for checkpoint_segments of 3 is extremely low for 
even a 1GB database.  Try increasing that to 30, restart the server, and 
rebuild the index to see how much the 1GB case speeds up.  If it's 
significantly faster (it should be), try the 5GB one again.

A good advice, but it's unlikely to make a difference at load time.

BTW: With CVS HEAD, if you create the table in the same transaction (or 
TRUNCATE) as you load the data, the COPY will skip writing WAL which can 
give a nice speedup.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com