Tom, > You need to > look into what the estimated vs actual rowcounts are for just the > two-column condition (ie, where nid = something AND iid1 = something). > Getting those estimates to match reality is the key to getting a sane > plan choice here. Many thanks - I agree in principle it isn't automatically the worst decision, however for this data set it unfortunately is :( Improving the accuracy of the cost estimates is exactly what I'm trying to achieve, so far I've tried.... - increasing the statistics target to the maximum setting with SET STATISTICS 1000 on columns rid, nid and iid1 of answers, then re-vacuuming - adjusting random_page_cost downwards even more to try to make it prefer index IO - increasing cpu_tuple_cost up to 1.0 (from 0.01) to try to discourage it from processing data rows - decreating cpu_index_tuple_cost to 0.0005 (from 0.005) to encourage it to spend more time in index rows. None of the above had any effect :( Interestingly, I didn't revacuum between changing the cost figues in the config file (I did reload). The cost figures from EXPLAIN did change, but using the queries below the 'wrong' index always comes up just over 1/3rd more expensive. Here's a query that matches actual data in the table; it sees the cost of the 2-column index as much less, but it takes an order of magnitude longer to run (using nid & iid1 means filtering through around 22,000 rows). The three queries; first is the one that's used in practise; against all 3 columns. The next 2 queries use the two different combinations of just 2 columns: emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND nid=253869913 AND iid1=535292129; QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------------------------------- Index Scan using index_answers_nidiid1 on answers (cost=0.00..114.14 rows=1 width=60) (actual time=0.168..790.262 rows=1 loops=1) Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) Filter: (rid = 668332334) Total runtime: 790.305 ms (4 rows) emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND nid=253869913; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Index Scan using index_answers_ridnidiid1 on answers (cost=0.00..152.49 rows=1 width=60) (actual time=0.120..0.141 rows=18 loops=1) Index Cond: ((rid = 668332334) AND (nid = 253869913)) Total runtime: 0.207 ms (3 rows) emystery=# explain analyse select * FROM ecos_answers WHERE nid=253869913 AND iid1=535292129; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ --- Index Scan using index_ecos_answers_nidiid1 on ecos_answers (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645 rows=21891 loops=1) Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) Total runtime: 2424.769 ms (3 rows) What setting should I be tweaking to make the two column index more expensive? I thought I would need to either... - inform the database that there are likely to be a lot of rows that need filtering (SET STATISTICS [maximum]???) or - tell the database that filtering rows is expensive (cpu_tuple_cost++, random_page_cost-- ???) ...but both of which I've already tried....but it would seem that from the EXPLAIN output that the first one is key; it shouldn't think (for the last example( that it's only going to get 1 row back!) but aside from increasing SET STATISTICS what other setting is there to inform it of this? The only thing I can think that's remaining is to rebuild/replace the preferred index, that should make it smaller and *if* the index cost estimates are partly based on # disk pages that should reduce the perceived cost of using it. Is this the case? Even if it does that still doesn't fix the underlying problem. Any help appreciated, Many thanks Shane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general