Re: Selecting max(pk) is slow on empty set

"Alexander Staubo" <alex@xxxxxxxxxxxxxxx> · Tue, 22 Jan 2008 14:16:33 +0100

On 1/22/08, Richard Huxton <dev@xxxxxxxxxxxx> wrote:
> Hmm, but with an estimated cost of 3646 (vs.633 for the max(*) which
> uses the wrong index). That explains why it's walking backwards through
> the pkey index, it thinks that it's 8 times cheaper.
[...]
> Have a look at most_common_vals,most_common_freqs in pg_stats for
> tbl=user_messages, att=user perhaps.

# select histogram_bounds from pg_stats where
tablename='user_messages' and attname='user_id';
                   histogram_bounds
-------------------------------------------------------
 {1,489,1097,1824,2555,3452,4488,5679,6879,8637,13448}

# select null_frac, n_distinct, most_common_vals, most_common_freqs
from pg_stats where tablename='user_messages' and attname='user_id';
 null_frac | n_distinct |                 most_common_vals
    |                                           most_common_freqs
-----------+------------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------
         0 |       2652 |
{5826,484,1206,780,823,4085,4157,5852,1962,6453} |
{0.00933333,0.00766667,0.00666667,0.00633333,0.006,0.00566667,0.00566667,0.00533333,0.005,0.00466667}

> Then see if an ALTER TABLE SET
> STATISTICS 100 makes a difference.

So it does:

# explain analyze select max(id) from user_messages where user_id = 13604;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1320.52..1320.53 rows=1 width=4) (actual
time=13.640..13.640 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
rows=0 loops=1)
         Index Cond: (user_id = 13604)
 Total runtime: 13.712 ms

Thank you! That solves my performance problem, at least.

But it's worrying that PostgreSQL should be so off in planning the
query. Does this behaviour qualify as a bug, or is this -- that is,
the need to tweak statistics parameters -- just your garden-variety
application-specific optimization?

Alexander.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match