Re: Cost of sort/order by not estimated by the query planner

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 2, 2009 at 8:01 AM, Laurent Laborde <kerdezixe@xxxxxxxxx> wrote:
> On Wed, Dec 2, 2009 at 1:47 PM, Greg Stark <gsstark@xxxxxxx> wrote:
>> On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde <kerdezixe@xxxxxxxxx> wrote:
>>>>                                           QUERY PLAN
>>>> -------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..2042.87 rows=5 width=1114)
>>>>   ->  Index Scan using _article_pkey on _article
>>>> (cost=0.00..7066684.46 rows=17296 width=1114)
>>>>         Filter: (bitfield && B'1'::bit varying)
>>>
>>
>> Ah, I missed this the first time around. It's scanning _article_pkey
>> here. Ie, it's scanning the table from the oldest to the newest
>> article assuming that the values wihch satisfy that constraint are
>> evenly distributed and it'll find five of them pretty quickly. In
>> reality there's a correlation between this bit being set and the value
>> of _article.id and all the ones with it set are towards the end.
>> Postgres doesn't have any statistics on how multiple columns are
>> related yet so it can't know this.
>>
>> If this is an important query you might try having an index on
>> <bitfield,id> or a partial index on "id where bitfield && B'1' ". The
>> latter sounds like what you really need
>
> There is, indeed, a lot of tricks and hacks.
> Maybe my question was too confusing.
>
> The question is : why a limit 5 is much much slower than a limit 500 ?
>
> The problem is in the "order by" and not "finding enough the data that
> match the filter".
> Even if it's not evenly distributed, the queries without "order by"
> are much much faster, EVEN when using the "pkey query plan".
>
> without "order by" using the bitmap -> fast
> without "order by" using the pkey index -> fast
> with "order by" using the bitmap -> fast
> with "order by" using the pkey index -> slowwwwwwwwwwwww

I'm confused.  I think you've only shown us two query plans, so it's
hard to judge what's going on here in the two cases you haven't shown.
 Also, you haven't shown the EXPLAIN ANALYZE output, so it's a bit
tricky to judge what is really happening.

However... as a general rule, the usual reason why the planner makes
bad decisions with small LIMITs is that it overestimates the impact of
the startup cost.  If one plan has a startup cost of 1 and a run cost
of 100, and another plan has a startup cost of 0 and a run cost of
1000000, the planner will pick the latter plan if a sufficiently small
fraction of the rows are being fetched (less than a millionth of
them).  It's easy for the estimates to be off by enough to make this
is a bad decision, especially if using operations that the planner
doesn't have good estimates for (&& may be one such).

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux