Re: SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 20 Jan 2006 13:51:14 -0500

"Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> writes:
> On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:
> Here's the problem... the estimate for the backwards index scan is *way*
> off:

>> ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual 
>> time=200032.928..200032.931 rows=1 loops=1)
>> ->  Index Scan Backward using pk_log on 
>> log  (cost=0.00..108047.11 rows=86089 width=4) (actual 
>> time=200032.920..200032.920 rows=1 loops=1)
>> Filter: (((create_time)::text < '2005/10/19'::text) AND 
>> (logsn IS NOT NULL))
>> Total runtime: 200051.701 ms

It's more subtle than you think.  The estimated rowcount is the
estimated number of rows fetched if the indexscan were run to
completion, which it isn't because the LIMIT cuts it off after the
first returned row.  That estimate is not bad (we can see from the
aggregate plan that the true value would have been 106708, assuming
that the "logsn IS NOT NULL" condition isn't filtering anything).

The real problem is that it's taking quite a long time for the scan
to reach the first row with create_time < 2005/10/19, which is not
too surprising if logsn is strongly correlated with create_time ...
but in the absence of any cross-column statistics the planner has
no very good way to know that.  (Hm ... but both of them probably
also show a strong correlation to physical order ... we could look
at that maybe ...)  The default assumption is that the two columns
aren't correlated and so it should not take long to hit the first such
row, which is why the planner likes the indexscan/limit plan.

			regards, tom lane