Hi, I have somewhat of a quandary with a large table in my database; PostgreSQL is choosing the 'wrong' index for a certain kind of query; causing performance to become an order of magnitude slower (query times usually measured in milliseconds now become seconds/minutes!). It's not that it isn't using any index (so enable_seqscan=off doesn't help), it's that the index it picks is suboptimal. The query is based on 3 of the table columns - there is an index on all three, but it prefers to use an index on just two of them, then filtering by the 3rd (EXPLAIN output is below - showing the query as well). I've tried everything I can find to make it not do this (config parameters, statistics target, vacuuming more...) - what am I missing? It happened a couple of times before - after some unusual activity patterns in a client script - but repeatedly vacuuming until the planner changed its mind made it go away. We then made config changes (random_page_cost and effective_cache_size) thinking that fixed the reason why it made the bad decision in the first place...but it would appear not The only thing remaining I can think of is rebuilding the 'preferred' index; that should reduce its size from 27Gb down to something more manageable; maybe that will let the database want to use it more because there'll be less I/O? I've considered upgrading to 8.3.5 as well, but I can't see anything in the changelogs that would fix this issue (I appreciate upgrading is a good idea anyway, that will be done at some point soon anyway). Hopefully all relevant info is listed below - if anyone's got any ideas I'd appreciate any help or pointers anyone can give, thanks... The server is PostgreSQL 8.3.0 on Linux with 32Gb RAM. /var/lib/pgsql/ is on a fibre-channel SAN. This table has around 680 million rows - and has been reasonably regularly vacuumed, but is probably in dire need of a VACUUM FULL and REINDEX to reclaim dead space (see disk space info at the bottom of the post). emystery=# \d answers Table "public.answers" Column | Type | Modifiers --------+---------+----------------------------------------------------- --------------- aid | integer | not null default nextval(('seq_answers_aid'::text)::regclass) rid | integer | not null nid | integer | not null iid1 | integer | iid2 | integer | iid3 | integer | text | text | extra | bigint | Indexes: "answers_pkey" PRIMARY KEY, btree (aid) "index_answers_iid1" btree (iid1) WHERE iid1 > 0 "index_answers_iid2" btree (iid2) WHERE iid2 > 0 "index_answers_iid3" btree (iid3) WHERE iid3 > 0 "index_answers_nidiid1" btree (nid, iid1) "index_answers_ridnidiid1" btree (rid, nid, iid1) This is what it is doing [slowly] - but for many values of rid/nid/iid1 there are a lot of rows to filter through (tens/hundreds of thousands) so this can take many seconds or minutes: emystery=# explain select * from answers where rid=1 and nid=2 and iid1=3; ------------------------------------------------------------------------ ------------------------- Index Scan using index_answers_nidiid1 on answers (cost=0.00..28.74 rows=1 width=62) Index Cond: ((nid = 2) AND (iid1 = 3)) Filter: (rid = 1) This is the pattern it *should* use (and does use on other installations of similar/older data). When this pattern is used the query always completes in <1 second, usually ~0.2 seconds! emystery20080821=# explain select * from answers where rid=1 and nid=2 and iid1=3; ------------------------------------------------------------------------ ---------------------------- Index Scan using index_answers_ridnidiid1 on answers (cost=0.00..99.04 rows=1 width=67) Index Cond: ((rid = 1) AND (nid = 2) AND (iid1 = 3)) The PostgreSQL configuration has been altered to help prefer random lookups [via an index], and to indicate to the database how much data the OS #seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 1.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 31GB The table has been VACUUM ANALYSE'd (repeatedly!) to no avail. We've also increased the statistics target for the columns in the table to 200 (from 10) which still isn't making any difference (I thought because the table is so large and the dataset is quite uneven that this should help...it's now at 1000 and vacuuming again so we might be lucky...) alter table answers alter column rid set statistics 200; alter table answers alter column nid set statistics 200; alter table answers alter column iid1 set statistics 200; Here is the [full] output from the most recent VACUUM: emystery=# vacuum verbose analyse answers; INFO: vacuuming "public.answers" INFO: scanned index "index_answers_nidiid1" to remove 38 row versions DETAIL: CPU 39.09s/64.46u sec elapsed 1077.90 sec. INFO: scanned index "index_answers_iid3" to remove 38 row versions DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: scanned index "index_answers_iid2" to remove 38 row versions DETAIL: CPU 19.72s/37.15u sec elapsed 404.70 sec. INFO: scanned index "index_answers_iid1" to remove 38 row versions DETAIL: CPU 28.92s/59.76u sec elapsed 414.62 sec. INFO: scanned index "index_answers_ridnidiid1" to remove 38 row versions DETAIL: CPU 50.99s/71.36u sec elapsed 742.03 sec. INFO: scanned index "answers_pkey" to remove 38 row versions DETAIL: CPU 28.45s/65.63u sec elapsed 376.34 sec. INFO: "answers": removed 38 row versions in 3 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: index "index_answers_nidiid1" now contains 671965676 row versions in 2646699 pages DETAIL: 38 index row versions were removed. 2497 index pages have been deleted, 2497 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "index_answers_iid3" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "index_answers_iid2" now contains 392408456 row versions in 1302348 pages DETAIL: 24 index row versions were removed. 9760 index pages have been deleted, 9760 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "index_answers_iid1" now contains 626231453 row versions in 2044032 pages DETAIL: 38 index row versions were removed. 1576 index pages have been deleted, 1576 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "index_answers_ridnidiid1" now contains 671965676 row versions in 3604651 pages DETAIL: 38 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "answers_pkey" now contains 671965676 row versions in 2022030 pages DETAIL: 38 index row versions were removed. 18670 index pages have been deleted, 18670 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "answers": found 38 removable, 671965676 nonremovable row versions in 4481083 pages DETAIL: 192913 dead row versions cannot be removed yet. There were 39603896 unused item pointers. 380386 pages contain useful free space. 0 pages are entirely empty. CPU 235.00s/366.10u sec elapsed 4100.83 sec. INFO: vacuuming "pg_toast.pg_toast_1118174" INFO: index "pg_toast_1118174_index" now contains 4479 row versions in 33 pages DETAIL: 0 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.25 sec. INFO: "pg_toast_1118174": found 0 removable, 4479 nonremovable row versions in 1232 pages DETAIL: 14 dead row versions cannot be removed yet. There were 1971 unused item pointers. 441 pages contain useful free space. 0 pages are entirely empty. CPU 0.01s/0.00u sec elapsed 1.18 sec. INFO: analyzing "public.answers" INFO: "answers": scanned 60000 of 4481083 pages, containing 9003061 live rows and 2964 dead rows; 60000 rows in sample, 672391060 estimated total rows VACUUM This information might help tell you a bit more about the contents of the table: emystery=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE '%answers%'; relname | relkind | reltuples | relpages -----------------------------------+---------+-------------+---------- answers | r | 6.71966e+08 | 4481083 seq_answers_aid | S | 1 | 1 index_answers_nidiid1 | i | 6.71966e+08 | 2646699 index_answers_iid3 | i | 0 | 1 index_answers_iid2 | i | 3.92408e+08 | 1302348 index_answers_iid1 | i | 6.26231e+08 | 2044032 index_answers_ridnidiid1 | i | 6.71966e+08 | 3604651 answers_pkey | i | 6.71966e+08 | 2022030 emystery=# SELECT attname,null_frac, n_distinct,correlation FROM pg_stats WHERE tablename = 'answers'; attname | null_frac | n_distinct | correlation ---------+-----------+------------+------------- aid | 0 | -1 | 0.78259 rid | 0 | 187498 | -0.00133284 nid | 0 | 27179 | 0.0342285 iid1 | 0.0668 | 49520 | 0.0306873 iid2 | 0.4187 | 5140 | 0.317481 iid3 | 1 | 0 | text | 0.933117 | 2573 | 0.112144 extra | 0.99995 | -1 | 1 Disk space usage, as calculated from pg_class.relpages * 8 (in kilobytes) is: Table data = 34.19Gb Toast = 64Kb index_ecos_answers_ridnidiid1 27.50Gb index_ecos_answers_nidiid1 20.19Gb index_ecos_answers_iid1 15.59Gb ecos_answers_pkey 15.43Gb index_ecos_answers_iid2 9.94Gb index_ecos_answers_iid3 8.00Kb -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general