I'm having a problem with analyze in 8.0.2 that is really bothering me. I have a table that contains address, city and state with about 7.8m rows in it. On that table I have two non-unique indexes for city and zipcode respectively. This table was loaded in a manner such that it is essentially sorted by zipcode. Sorting by zipcode implies that there should exist a pretty strong correlation on the city column as well. With a statistics target of 50 on city I'm getting good estimates for row counts for arbitrary cities (ie: explain select count(*) from addresses where city = 'DALLAS' estimates 474k rows out of 500k actual) but a poor estimate for the correlation (0.13 according to pg_stats). This seems to be causing the planner to pick a table scan for "select count(*) from test_zipc where city = 'DALLAS' (est 474k rows)" vs picking an index scan for "select count(*) from test_zipc where zipcode like '75%' (est 2m rows)". Increasing the statistics target on the city column and re-analyzing the table seems to make the correlation estimate better (shows about 0.5) but the row count estimates are thrown off by 2 orders of magnitude in some cases. Repeating the above queries I get a row estimate of 8k for "select count(*) from test_zipc where city = 'DALLAS'" and a row estimate of 6m for the "select count(*) from test_zipc where zipcode like '75%'". In this case, the planner picked an index scan for the city = 'X' condition but for what I feel are the wrong reasons because it under estimated the row count. Re-analyzing the table multiple times always shows about an 8k estimate. Is my data set that promblematic? Has anyone seen similar behavior? Any suggestions on how to improve these stats? Regards, Shelby Cain __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings