10+hrs vs 15min because of just one index

Aaron Turner <synfinatic@xxxxxxxxx> · Fri, 10 Feb 2006 00:16:49 -0800

So I'm trying to figure out how to optimize my PG install (8.0.3) to
get better performance without dropping one of my indexes.

Basically, I have a table of 5M records with 3 columns:

pri_key (SERIAL)
data char(48)
groupid integer

there is an additional unique index on the data column.

The problem is that when I update the groupid column for all the
records, the query takes over 10hrs (after that I just canceled the
update).  Looking at iostat, top, vmstat shows I'm horribly disk IO
bound (for data not WAL, CPU 85-90% iowait) and not swapping.

Dropping the unique index on data (which isn't used in the query),
running the update and recreating the index  runs in under 15 min. 
Hence it's pretty clear to me that the index is the problem and
there's really nothing worth optimizing in my query.

As I understand from #postgresql, doing an UPDATE on one column causes
all indexes for the effected row to have to be updated due to the way
PG replaces the old row with a new one for updates.  This seems to
explain why dropping the unique index on data solves the performance
problem.

interesting settings:
shared_buffers = 32768
maintenance_work_mem = 262144
fsync = true
wal_sync_method = open_sync
wal_buffers = 512
checkpoint_segments = 30
effective_cache_size = 10000
work_mem = <default> (1024 i think?)

box:
Linux 2.6.9-11EL (CentOS 4.1)
2x Xeon 3.4 HT
2GB of RAM (but Apache and other services are running)
4 disk raid 10 (74G Raptor) for data
4 disk raid 10 (7200rpm) for WAL

other then throwing more spindles at the problem, any suggestions?

Thanks,
Aaron

--
Aaron Turner
http://synfin.net/