On Thu, Sep 25, 2008 at 1:24 PM, John Huttley <John@xxxxxxxxxxxxxxxxxx> wrote: > I've just had an interesting encounter with the slow full table update > problem that is inherent with MVCC > > The system is 64 bit linux with 2.6.25 kernel feeding scsi disks. > > the table is > > CREATE TABLE file ( > fileid integer NOT NULL, > fileindex integer DEFAULT 0 NOT NULL, > jobid integer NOT NULL, > pathid integer NOT NULL, > filenameid integer NOT NULL, > markid integer DEFAULT 0 NOT NULL, > lstat text NOT NULL, > md5 text NOT NULL, > perms text > ); > > ALTER TABLE ONLY file > ADD CONSTRAINT file_pkey PRIMARY KEY (fileid); > > CREATE INDEX file_fp_idx ON file USING btree (filenameid, pathid); > CREATE INDEX file_jobid_idx ON file USING btree (jobid); > > There are 2.7M rows. > > running update file set perms='0664' took about 10 mins So, how many rows would already be set to 0664? Would adding a where clause speed it up? update file set perms='0664' where perms <> '0664'; > during this period, vmstat reported Blocks Out holding in the 4000 to 6000 > range. > > > When I dropped the indexes this query ran in 48sec. > Blocks out peaking at 55000. > > So there is a double whammy. > MVCC requires more work to be done when indexes are defined and then this > work > results in much lower IO, compounding the problem. That's because it becomes more random and less sequential. If you had a large enough drive array you could get that kind of performance for updating indexes, since the accesses would tend to hit different drives most the time. Under heavy load on the production servers at work we can see 30 to 60 Megs a second random access with 12 drives, meaning 2.5 to 5Megs per second per drive. Sequential throughput is about 5 to 10 times higher. What you're seeing are likely the effects of running a db on insufficient drive hardware.