On Sun, Aug 18, 2013 at 10:33 PM, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:
Tyler Reese <jukey91@xxxxxxxxx> wrote:
> Kevin Grittner <kgrittn@xxxxxxxxx> wrote:
>> Tyler Reese <jukey91@xxxxxxxxx> wrote:
>>> mydb=> explain analyze SELECT * FROM "cdr" WHERE lower("CallingPartyNumber") = '9725551212' order by "key" limit 100;
>>>
>>> Limit (cost=0.00..72882.05 rows=100 width=757) (actual time=20481.083..30464.960 rows=11 loops=1)
>>> -> Index Scan using cdr_pkey on cdr (cost=0.00..30036152.32 rows=41212 width=757) (actual time=20481.049..30464.686 rows=11 loops=1)
>>> Filter: (lower("CallingPartyNumber") = '9725551212'::text)
>>> Total runtime: 30465.246 ms
>>
>> It thinks that it will only need to read 1/412th of the table to
>> find 100 matching rows, and using that index it will be able to
>> skip the sort. Since there aren't 100 matching rows, it has to
>> read the whole table through the index.
> So, since it thinks it needs to read 1/412th of the table is theThe optimizer compares the estimated cost of reading all matching
> reason why the query planner chooses to use the primary key index
> instead of the callingpartynumber index, like it does in the
> first 3 cases?
rows (but only matching rows) out of order and then sorting them to
the estimated cost of reading them in order and filtering out the
rows that don't match (and stopping when the limit is reached).
Since it though a lot of rows would match, that made the sort look
more expensive and also like it would not reed to read a very large
percentage of the table.
That's based on the distribution observed in the last random sample
> I'm curious as to why it says "rows=41212". Is that the estimate
> of the number of rows that meet the filter condition? Where does
> that come from?
when ANALYZE was run (as a command or by autovacuum). When there
is an error in the estimate that bad, either autovacuum is not
configured to be aggressive enough in analyzing or the stample size
was not large enough.
>> Raising the statistics target and running ANALYZE might allow
>> it to use a more accurate estimate, and thereby make a better
>> choice.
>
> I haven't heard of raising the statistics target, so I'll read uphttp://www.postgresql.org/docs/9.2/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER
> on that.
http://www.postgresql.org/docs/9.2/interactive/sql-altertable.html
Only if you have indexes on expressions rather than simple column
> A few days ago, all 4 cases were responding equally fast. I had
> been messing around with the postgres settings, and I went and
> dropped all of the indexes and recreated them just to see what
> would happen. I wouldn't think that recreating the indexes would
> cause case 4 to go slow, but that's the symptom I am seeing now.
> Should I be running analyze on a table after it has been
> reindexed?
names. If you have an index on lower("CallingPartyNumber") that
would qualify as an _expression_.
The normal reason to need to get fresh statistics is because of
changes in the distribution of values in a column, particularly
after a bulk load. Also, columns with a large number of distinct
values tend to benefit from a higher statistics target. The down
sides of higher statistics targets are a longer time to ANALYZE and
increased planning time; so it's generally best to use the default
except where a particular problem has been observed, like in this
case.
I performed ANALYZE on the table and now case 4 is fast again:
mydb=> explain analyze SELECT * FROM "cdr" WHERE lower("CallingPartyNumber") = '9725551212' order by "key" limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=958.72..958.97 rows=100 width=758) (actual time=0.251..0.310 rows=11 loops=1)
-> Sort (cost=958.72..959.32 rows=240 width=758) (actual time=0.244..0.266 rows=11 loops=1)
Sort Key: key
Sort Method: quicksort Memory: 30kB
-> Bitmap Heap Scan on cdr (cost=7.30..949.55 rows=240 width=758) (actual time=0.105..0.162 rows=11 loops=1)
Recheck Cond: (lower("CallingPartyNumber") = '9725551212'::text)
-> Bitmap Index Scan on callingpartynumber_lower (cost=0.00..7.24 rows=240 width=0) (actual time=0.085..0.085 rows=11 loops=1)
Index Cond: (lower("CallingPartyNumber") = '9725551212'::text)
Total runtime: 0.517 ms
(9 rows)
Thanks for the help, Kevin.