Hello, we are experiencing some performance degradation on a database where the main table is running towards the 100M record. Together with the slowness of the queries I notice these symptoms: - size bloat of partial indexes - very bad planning estimates I'd appreciate any hint to get a better picture of what is going on and to understand how much the symptoms are correlated. The most noticeable problems are with queries such as: select * from foos where <condition> where there is a very selective condition (about 10K record over 100M) and a partial index on them. The index is correctly taken in consideration for the scan but with an extremely wrong estimate and painful performance, e.g.: # explain select count(*), sum(x) from foos where rcon IS NULL AND is_settled = true; QUERY PLAN ---------------------------------------------------------------------------------------- Aggregate (cost=4774842.01..4774842.02 rows=1 width=8) -> Bitmap Heap Scan on foos (cost=218211.50..4674496.17 rows=20069167 width=8) Recheck Cond: ((rcon IS NULL) AND is_settled) -> Bitmap Index Scan on i_rcon3 (cost=0.00..213194.21 rows=20069167 width=0) (I don't have an analyze output anymore for this, but the rows returned were about 7K at the moment). This query used to run in sub-second time: recently it started taking several minutes or, if run quickly after a previous run, around 10 seconds. pg_stat_all_index showed >400M size for this index: way too much to index <10K records. Trying to solve this bloat problem I've tried: 1: manually running vacuum on the table (the autovacuum had not touched it for a while and it seems it avoids it probably because other table are updated more. The verbose output concerning the above index was: ... INFO: scanned index "i_rcon3" to remove 22369332 row versions DETAIL: CPU 0.84s/5.20u sec elapsed 50.18 sec. ... INFO: "foos": removed 22369332 row versions in 1009710 pages DETAIL: CPU 34.38s/27.01u sec elapsed 2226.51 sec. ... INFO: scanned index "i_rcon3" to remove 15330597 row versions DETAIL: CPU 0.48s/2.14u sec elapsed 15.42 sec. ... INFO: "foos": removed 15330597 row versions in 569208 pages DETAIL: CPU 9.40s/8.42u sec elapsed 732.17 sec. ... INFO: index "i_rcon3" now contains 43206 row versions in 53495 pages DETAIL: 9494602 index row versions were removed. 53060 index pages have been deleted, 20032 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. ... WARNING: relation "public.foos" contains more than "max_fsm_pages" pages with useful free space HINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages". Albeit the output was promising, the planner estimate and index size didn't change very much (I assumed the performance didn't change as well so I didn't run an explain analyze). 2. I tried to rebuild concurrently an index with exactly the same properties: this produced an index with a more reasonable size (now, after a busy weekend running it is about 12M) and this solved the performance problem. It didn't fix the bad estimate anyway. 3. I increased the statistics from the default 10 to 100 and analyzed expecting to see some change in the estimated number of rows: apart from a small fluctuation the estimate remained around the 20M. 4. the index was not indexing a distinct field but rather a fkey with just no more than 4K distinct values and an extremely uneven distribution. I created an index with the same condition but on the pkey but the estimate didn't change: stable on the 20M records even after increasing the stats to 100 for the pkey field too. Does anybody have some information about where the bloat is coming from and what is the best way to get rid of it? Would a vacuum full fix this kind of problem? Is there a way to fix it without taking the system offline? The indexed condition is a state of the evolution of the records in the table: many records assume that state for some time, then move to a different state no more indexed. Is the continuous addition/deletion of records to the index causing the bloat (which can be then considered limited to the indexes with a similar usage pattern)? Is reindex/concurrent rebuild the best answer? Any idea of where the 20M record estimate is coming from? Isn't the size of the partial index taken into account in the estimate? We are running PG 8.3, planning for migration on new hardware and concurrently on a new PG version in the near future. Are our problematic behaviours known to be fixed in later releases? Thank you very much. Regards. -- Daniele -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance