Re: plan question - query with order by and limit not choosing index depends on size of limit, table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Robert, this is what I was looking for.  I will try these suggestions and follow up if any of them are the silver bullet.

On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:
On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers <mbroers@xxxxxxxxx> wrote:
> Thanks for the assistance.
> Here is an explain analyze of the query with the problem limit:
> production=# explain analyze select * from landing_page.messages where
> ((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
> messages.created_at ASC limit 10;
>
>    QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------
> ------------------------------------------------------------
>  Limit  (cost=0.00..2891.06 rows=10 width=1340) (actual
> time=207922.586..207922.586 rows=0 loops=1)
>    ->  Index Scan using idx_landing_page_messages_created_at on messages
>  (cost=0.00..449560.48 rows=1555 widt
> h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
>          Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
>  Total runtime: 207949.413 ms
> (4 rows)

You're not the first person to have been bitten by this.  The
optimizer thinks that rows WHERE NOT processed and topic = 'x' are
reasonably common, so it figures that it can just index scan until it
finds 10 of them.  But when it turns out that there are none at all,
it ends up having to scan the entire index, which stinks big-time.

The alternative plan is to use a different index to find ALL the
relevant rows, sort them, and then take the top 10.   That would suck
if there actually were tons of rows like this, but there aren't.

So the root of the problem, in some sense, is that the planner's
estimate of the selectivity of "NOT processed and topic = 'x'" is not
very good.  Some things to try:

- increase the statistics target for the "processed" and "topic"
columns even higher
- put the processed rows in one table and the not processed rows in
another table
- do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
10 to try to fool the planner into planning based on the higher, inner
limit
- create a partial index on messages (topic) WHERE NOT processed and
see if the planner will use it

...Robert


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux