Tyler Reese <jukey91@xxxxxxxxx> wrote: > Kevin Grittner <kgrittn@xxxxxxxxx> wrote: >> Tyler Reese <jukey91@xxxxxxxxx> wrote: >>> mydb=> explain analyze SELECT * FROM "cdr" WHERE lower("CallingPartyNumber") = '9725551212' order by "key" limit 100; >>> >>> Limit (cost=0.00..72882.05 rows=100 width=757) (actual time=20481.083..30464.960 rows=11 loops=1) >>> -> Index Scan using cdr_pkey on cdr (cost=0.00..30036152.32 rows=41212 width=757) (actual time=20481.049..30464.686 rows=11 loops=1) >>> Filter: (lower("CallingPartyNumber") = '9725551212'::text) >>> Total runtime: 30465.246 ms >> >> It thinks that it will only need to read 1/412th of the table to >> find 100 matching rows, and using that index it will be able to >> skip the sort. Since there aren't 100 matching rows, it has to >> read the whole table through the index. > So, since it thinks it needs to read 1/412th of the table is the > reason why the query planner chooses to use the primary key index > instead of the callingpartynumber index, like it does in the > first 3 cases? The optimizer compares the estimated cost of reading all matching rows (but only matching rows) out of order and then sorting them to the estimated cost of reading them in order and filtering out the rows that don't match (and stopping when the limit is reached). Since it though a lot of rows would match, that made the sort look more expensive and also like it would not reed to read a very large percentage of the table. > I'm curious as to why it says "rows=41212". Is that the estimate > of the number of rows that meet the filter condition? Where does > that come from? That's based on the distribution observed in the last random sample when ANALYZE was run (as a command or by autovacuum). When there is an error in the estimate that bad, either autovacuum is not configured to be aggressive enough in analyzing or the stample size was not large enough. >> Raising the statistics target and running ANALYZE might allow >> it to use a more accurate estimate, and thereby make a better >> choice. > > I haven't heard of raising the statistics target, so I'll read up > on that. http://www.postgresql.org/docs/9.2/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER http://www.postgresql.org/docs/9.2/interactive/sql-altertable.html > A few days ago, all 4 cases were responding equally fast. I had > been messing around with the postgres settings, and I went and > dropped all of the indexes and recreated them just to see what > would happen. I wouldn't think that recreating the indexes would > cause case 4 to go slow, but that's the symptom I am seeing now. > Should I be running analyze on a table after it has been > reindexed? Only if you have indexes on expressions rather than simple column names. If you have an index on lower("CallingPartyNumber") that would qualify as an expression. The normal reason to need to get fresh statistics is because of changes in the distribution of values in a column, particularly after a bulk load. Also, columns with a large number of distinct values tend to benefit from a higher statistics target. The down sides of higher statistics targets are a longer time to ANALYZE and increased planning time; so it's generally best to use the default except where a particular problem has been observed, like in this case. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general