Hi, I want to get some statistical information from people having experience with vacuuming big tables: I was a little bit confused why some vacuums take much much longer then other and I was doing some tests with a test database. All tests have been made with PostgreSQL 8.0.3. At all I made three tests with some table structures and indexes which are problematic in some databases I have made. I took the structure of the tables, anonymized the columns and filled up the table with random data generated by a script. In this test scenario the tables are filled up with data and NO CHANGES are made to the table, because I just wanted to find out the time it takes to just go through the table and indexes for vacuum. First table ----------- CREATE TABLE "public"."tbl1" ( "id" BIGSERIAL, "col1" BIGINT NOT NULL, "col2" VARCHAR NOT NULL, "col3" BIGINT NOT NULL, "col4" INTEGER NOT NULL, "col5" VARCHAR NOT NULL, "col6" TIMESTAMP WITHOUT TIME ZONE NOT NULL, "col7" "public"."tsvector", "col8" VARCHAR NOT NULL, "col9" BOOLEAN NOT NULL, "col10" BOOLEAN NOT NULL, "col11" "public"."tsvector", "col12" INTEGER, CONSTRAINT "tbl1_pkey" PRIMARY KEY("id") ) WITH OIDS; CREATE INDEX "idx_tbl1_col1" ON "public"."tbl1" USING btree ("col1"); CREATE INDEX "idx_tbl1_col11" ON "public"."tbl1" USING gist ("col11"); CREATE INDEX "idx_tbl1_col8" ON "public"."tbl1" USING btree ("col8"); CREATE INDEX "idx_tbl1_col7" ON "public"."tbl1" USING gist ("col7"); CREATE INDEX "idx_tbl1_col3" ON "public"."tbl1" USING btree ("col3"); => Filled up with 400.000 rows LOG: duration: 148187.934 ms statement: VACUUM VERBOSE tbl1; Second table ------------ CREATE TABLE "public"."tbl2" ( "id" BIGSERIAL, "col1" BIGINT, "col2" VARCHAR NOT NULL, "col3" INTEGER NOT NULL, "col4" BIGINT NOT NULL, "col5" TIMESTAMP WITHOUT TIME ZONE, "col6" BOOLEAN NOT NULL, "col7" BIGINT, CONSTRAINT "tbl2_pkey" PRIMARY KEY("id") ) WITH OIDS; CREATE INDEX "idx_tbl2_col7" ON "public"."tbl2" USING btree ("col7"); CREATE INDEX "idx_tbl2_col1" ON "public"."tbl2" USING btree ("col1"); => Filled up with 6.000.000 rows LOG: duration: 138934.027 ms statement: VACUUM VERBOSE tbl2; Third table ----------- CREATE TABLE "public"."tbl3" ( "id" BIGSERIAL, "col1" BIGINT NOT NULL, "col2" BIGINT, "col3" BIGINT NOT NULL, "col4" VARCHAR, "col5" VARCHAR, "col6" INTEGER, "col7" TIMESTAMP WITHOUT TIME ZONE, CONSTRAINT "tbl3_pkey" PRIMARY KEY("id") ) WITH OIDS; CREATE INDEX "idx_tbl3_col2" ON "public"."tbl3" USING btree ("col2"); CREATE INDEX "idx_tbl3_col3" ON "public"."tbl3" USING btree ("col3"); CREATE INDEX "idx_tbl3_col1_col3" ON "public"."tbl3" USING btree ("col1", "col3"); => Filled up with 100.000.000 Vacuum could not be performed - took longer then 24h and was aborted then by me manually. Postgresql.conf --------------- listen_addresses = '*' shared_buffers = 32768 maintenance_work_mem = 1048576 max_stack_depth = 81920 max_fsm_pages = 30000000 max_fsm_relations = 1000 vacuum_cost_delay = 0 bgwriter_percent = 0 bgwriter_maxpages = 0 fsync = false enable_hashagg = true enable_hashjoin = true enable_indexscan = true enable_mergejoin = true enable_nestloop = true enable_seqscan = true enable_sort = true enable_tidscan = true cpu_tuple_cost = 0.01 cpu_index_tuple_cost = 0.001 cpu_operator_cost = 0.0025 effective_cache_size = 102400 geqo = false default_statistics_target = 1 from_collapse_limit = 8 join_collapse_limit = 8 log_destination = 'stderr' redirect_stderr = true log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_truncate_on_rotation = false log_rotation_age = 1440 client_min_messages = log log_min_messages = log log_error_verbosity = default log_min_duration_statement = 1 log_connections = false log_disconnections = false log_line_prefix = '' stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true Problem ------- In my opinion the VACUUM of the third table takes too long time. The table is less complex then the other ones - so going on from the badest case that every entry needs as long as for the most complex table (tbl1) then I end up with this calculation: 148 seconds for 400.000 rows = 2702 rows per second 100.000.000 rows = 37.009 seconds (=616 minutes, =10 hours) But the VACUUM of the tbl3 needs more then 24h for beeing VACUUMed even without changes. Hardware + Software configuration --------------------------------- SCSI RAID5 with usable space of 120GB 2 x Intel(R) Xeon(TM) CPU 3.20GHz RAM: 4148840 kB (4GB) Kernel: 2.6.12.2 PGDATA is running on a separate partition There are running no other things on this server. Question -------- The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up now the VACUUM of tbl3? --- Regards, Aldor