On 2/10/06, hubert depesz lubaczewski <depesz@xxxxxxxxx> wrote: > On 2/10/06, Aaron Turner <synfinatic@xxxxxxxxx> wrote: > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > get better performance without dropping one of my indexes. > > Basically, I have a table of 5M records with 3 columns: > > pri_key (SERIAL) > > data char(48) > > groupid integer > > there is an additional unique index on the data column. > > The problem is that when I update the groupid column for all the > > records, the query takes over 10hrs (after that I just canceled the > > update). Looking at iostat, top, vmstat shows I'm horribly disk IO > > bound (for data not WAL, CPU 85-90% iowait) and not swapping. > > Dropping the unique index on data (which isn't used in the query), > > for such a large update i would suggest to go with different scenario: > split update into packets (10000, or 50000 rows at the time) > and do: > update packet > vacuum table > for all packets. and then reindex the table. should work much nicer. The problem is that all 5M records are being updated by a single UPDATE statement, not 5M individual statements. Also, vacuum can't run inside of a transaction. On a side note, is there any performance information on updating indexes (via insert/update) over the size of the column? Obviously, char(48) is larger then most for indexing purposes, but I wonder if performance drops linerally or exponentially as the column width increases. Right now my column is hexidecimal... if I stored it as a binary representation it would be smaller. Thanks, Aaron