On Thursday 22 March 2007 19:46, Michael Stone wrote: > On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote: > >you're right until you're using a single disk :) > >Now, imagine you have more disks > > I do have more disks. I maximize the I/O performance by dedicating > different sets of disks to different tables. YMMV. I do suggest watching > your I/O rates and wallclock time if you try this to see if your > aggregate is actually substantially faster than the single case. (I > assume that you haven't yet gotten far enough to actually do performance > testing.) You may also want to look into tuning your sequential I/O > performance. > > Mike Stone Mike, specially for you :) Parallel Vacuum Test ====================== - Database 'db_OBJ' PgSQL 8.2.3 tables: object1, object2, ... object8 (all the same) volume: 10.000.000 rows in each table, 22GB in total - Script Mono Vacuum $ cat vac_mono.sh /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 $ - Script Parallel Vacuum $ cat vac_pll.sh /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 & wait $ Test 1: Cold Clean database (already previously vacuumed) ========================================================= Scenario: - stop database - flush FS cache (umount/mount) - start database - execute vacuum script $ time sh vac_mono.sh real 4m24.23s user 0m0.00s sys 0m0.01s $ time sh vac_pll.sh real 1m9.36s user 0m0.00s sys 0m0.01s Test 2: Hot Dirty database (modified and not vacuumed) ====================================================== Scenario: - stop database - flush FS cache (umount/mount) - start database - execute 200.000 updates against each from 8 object' tables - execute vacuum script $ time sh vac_mono.sh real 9m36.90s user 0m0.00s sys 0m0.01s $ time sh vac_pll.sh real 2m10.41s user 0m0.00s sys 0m0.02s Speed-up x4 is obtained just because single vacuum process reaching max 80MB/sec in throughput, while with 8 parallel vacuum processes I'm jumping to 360MB/sec... And speakink about Sequential I/O: while you're doing read - file system may again prefetch incoming data in way once you reclaim next read - your data will be already in FS cache. However, file system cannot 'pre-write' data for you - so having more concurrent writers helps a lot! (Of course in case you have a storage configured to keep concurrent I/O :)) Well, why all this staff?... Let's imagine once you need more performance, and you buy 10x times more performant storage box, will you still able to kill it with a single-process I/O activity? No... :) To scale well you need to be able to split your work in several task executed in parallel. And personally, I'm very happy we can do it with vacuum now - the one of the most critical part of PostgreSQL... Best regards! -Dimitri