Re: Bad query plan with high-cardinality column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday, February 22, 2013 at 21:33 , Tom Lane wrote:
> The reason is that the LIMIT may stop the query before it's scanned all
> of the index. The planner estimates on the assumption that the desired
> rows are roughly uniformly distributed within the created_at index, and
> on that assumption, it looks like this query will stop fairly soon ...
> but evidently, that's wrong. On the other hand, it knows quite well
> that the other plan will require pulling out 5000-some rows and then
> sorting them before it can return anything, so that's not going to be
> exactly instantaneous either.
> 
> In this example, I'll bet that conversation_id and created_at are pretty
> strongly correlated, and that most or all of the rows with that specific
> conversation_id are quite far down the created_at ordering, so that the
> search through the index takes a long time to run. OTOH, with another
> conversation_id the same plan might run almost instantaneously.


That's right. So I created a composite index, and not only does this make the plan correct, but the planner now chooses a much more efficient plan than the previous index that indexed only on "conversation_id":

    Limit  (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 rows=13 loops=1)                                                                                   
      Buffers: shared hit=8                                                                                                                                                 
      ->  Index Scan using index_comments_on_conversation_id_and_created_at on comments  (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 rows=13 loops=1)
            Index Cond: (conversation_id = 3975979)                                                                                                                         
            Buffers: shared hit=8                                                                                                                                           
    Total runtime: 0.094 ms                                                                                                                                                 


Is this because it can get the value of "created_at" from the index, or is it because it can know that the index is pre-sorted, or both?

Very impressed that Postgres can use a multi-column index for this. I just assumed, wrongly, that it couldn't. I will have to go review my other tables now and see if they can benefit from multi-column indexes.

Thanks!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux