Re: Selecting max(pk) is slow on empty set

"Alexander Staubo" <alex@xxxxxxxxxxxxxxx> · Tue, 22 Jan 2008 15:24:27 +0100

On 1/22/08, Richard Huxton <dev@xxxxxxxxxxxx> wrote:
> Although the row-estimate still seems quite high. You might want to
> increase it even further (maximum is 1000). If this is a common query,
> I'd look at an index on (user,id) rather than just (user) perhaps.

Actually that index (with the same statistics setting as before)
yields slightly worse performance:

# explain analyze select max(id) from user_messages where user_id = 13604;

      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.86..3.87 rows=1 width=0) (actual time=0.051..0.052
rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..3.86 rows=1 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
           ->  Index Scan Backward using user_messages_user_id_id on
user_messages  (cost=0.00..1486.79 rows=385 width=4) (actual
time=0.042..0.042 rows=0 loops=1)
                 Index Cond: (user_id = 13604)
                 Filter: (id IS NOT NULL)
 Total runtime: 0.128 ms

Compare with the plain index on the one attribute:

# explain analyze select max(id) from user_messages where user_id = 13604;
                                                                 QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1388.34..1388.35 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..1387.40 rows=374 width=4) (actual time=0.030..0.030 rows=0
loops=1)
         Index Cond: (user_id = 13604)
 Total runtime: 0.085 ms

> > But it's worrying that PostgreSQL should be so off in planning the
> > query. Does this behaviour qualify as a bug, or is this -- that is,
> > the need to tweak statistics parameters -- just your garden-variety
> > application-specific optimization?
>
> Well, it's data-specific rather than application specific I suppose. The
>   issue is that there is a cost to tracking 100 values and you don't
> want to pay that on every column in every table. If user 13604 isn't in
> the list of most-common users then all it can really do is fix an upper
> bound on how many matches it can have. Of course you and I can reason
> outside of the data and guess that manu users won't have more than a
> handful of messages, but that's not something PG can do.

Absolutely. Thanks for the pointers.

Alexander.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq