Re: Order by (for 15 rows) adds 30 seconds to query time

Richard Neill <rn214@xxxxxxxxx> · Wed, 02 Dec 2009 05:42:59 +0000

Dear Kevin,

Thanks for a very helpful reply.

Kevin Grittner wrote:
Richard Neill <rn214@xxxxxxxxx> wrote:

Am I wrong in thinking that ORDER BY is always applied after the
main query is run?

Yes, you are wrong to think that.  It compares the costs of various
plans, and when it has an index with the high order portion matching
your ORDER BY clause, it may think that it can scan that index to
generate the correct sequence.  If the sort is estimated to be
expensive enough compared to the index scan, it will use the index
scan and skip the sort.  Sorting hundreds of billions of rows can be
expensive.

That makes sense now.

Even if I run it this way:

select * from (select * from h.inventory where demand_id
=289276563) as sqry order by id;

which should(?) surely force it to run the first select, then
sort,

I wouldn't necessarily assume that.  You can EXPLAIN that form of
the query and find out easily enough.  Does it say:

                            ->  Index Scan using inventory_demand_id on
inventory  (cost=0.00..22.36 rows=28 width=56) (actual time=0.025..0.053
rows=15 loops=1)
                                  Index Cond: (demand_id = 289276563)

or:

                            ->  Index Scan using inventory_pkey on
inventory  (cost=0.00..879728.20 rows=28 width=56) (actual
time=31738.956..32860.738 rows=15 loops=1)
                                  Filter: (demand_id = 289276563)

it's quick if I do order by id+1

You don't have an index on id+1.

Your explanation is validated by the explain - it only does the sort 
last iff I use "order by id+1", where there is no index for that.

[Aside: using "id+0" also forces a sort.]

The real problem to solve here is that it's estimating the rows
count for the result so badly.  If you need a short-term
work-around, you've already discovered that you can keep it from
using the index on id for ordering by creating an expression using
id which causes it not to consider the index a match.  That's kind
of ugly to keep long term, though.

We seem to have a general case of very bad query plans, where in other 
cases, explain analyze shows that the query-planner's guesses are miles 
adrift.

Others have said that this is symptomatic of a lack of doing analyze, 
however we are doing quite a lot of analyzing (both through autovacuum, 
and a manual "vacuum verbose analyze" every night). Our underlying 
statistical distribution isn't that changeable.

Thanks,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance