On 4/2/04 12:33 PM, "Alvaro Herrera" <alvherre@dcc.uchile.cl> wrote: >> -> Index Scan using message_recipients_i_recipient on >> message_recipients r (cost=0.00..30569.25 rows=30622 width=21) (actual >> time=83.146..31609.149 rows=312741 loops=1) > > This estimate is off by an order of magnitude. Maybe you want to > increase the statistic target for this column ... Please explain. The only thing I can find about this is "For scalar data types, attstattarget is both the target number of ³most common values² to collect, and the target number of histogram bins to create." Why would this affect the row count estimations? With 10 being the default, what would an appropriate value be? I changed it to 100 for both message_key and reciepent (in message_recipients table). That used the same plan and made the discrepancy worse 13000 estimated, 312000 actual: > -> Index Scan using message_recipients_i_recipient on > message_recipients r (cost=0.00..13324.75 rows=13249 width=21) (actual > time=4751.701..34373.808 rows=312741 loops=1) And, it still only uses the compound index if I set random_page_cost to < 1. Analyze is run regularly by auto-vacuum (multiple times a day - I need to reduce this). If I read this right, before changing the statistic target: db=# select relname, relfilenode from pg_class where relname like 'message_recip%'; relname | relfilenode --------------------------------+------------- message_recipients_i_recipient | 17234 message_recipients_i_message | 17235 message_recipients | 17162 (3 rows) db=> select attname, attrelid, attstattarget from pg_attribute where attname='recipient'; attname | attrelid | attstattarget -----------+----------+--------------- recipient | 17162 | -1 recipient | 17234 | 0 db=# select staattnum, stadistinct from pg_statistic where starelid=17162; staattnum | stadistinct -----------+------------- 1 | 1.11264e+06 2 | 3598 3 | 2 (3 rows) db=# select attname, attstattarget, attnum from pg_attribute where attrelid=17162; attname | attstattarget | attnum ----------------+---------------+-------- tableoid | 0 | -7 cmax | 0 | -6 xmax | 0 | -5 cmin | 0 | -4 xmin | 0 | -3 oid | 0 | -2 ctid | 0 | -1 message_key | -1 | 1 recipient | -1 | 2 recipient_type | -1 | 3 There are 20+ million (unique) message_key records, so stadistinct for column 1 is off by a factor of 20. There are just under 1.2 million unique recipients (60 million total records), so column 2 (recipient) is off by a factor of over 300. In the messages table, message_key (which is unique) has statdistinct = -1 db=# alter table message_recipients alter column recipient set statistics 100; db=# alter table message_recipients alter column message_key set statistics 100; db=# analyze; db=# select distinct relname, attname, attnum, attstattarget, stadistinct from pg_class c, pg_attribute a, pg_statistic s where c.relfilenode=a.attrelid and c.relfilenode=starelid and a.attnum=s.staattnum and relname='message_recipients' order by attnum; relname | attname | attnum | attstattarget | stadistinct --------------------+----------------+--------+---------------+------------- message_recipients | message_key | 1 | 100 | 2.19256e+06 message_recipients | recipient | 2 | 100 | 8672 message_recipients | recipient_type | 3 | -1 | 2 Stadistinct is about twice what it was before, but is still way off the number of distinct values for that column. Manually setting stadistinct closer to the number of unique values makes the discrepancy still larger. Wes ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster