Re: Selecting max(pk) is slow on empty set

"Alexander Staubo" <alex@xxxxxxxxxxxxxxx> · Tue, 22 Jan 2008 16:05:29 +0100

On 1/22/08, Richard Huxton <dev@xxxxxxxxxxxx> wrote:
> Alexander Staubo wrote:
> > On 1/22/08, Richard Huxton <dev@xxxxxxxxxxxx> wrote:
> >> Although the row-estimate still seems quite high. You might want to
> >> increase it even further (maximum is 1000). If this is a common query,
> >> I'd look at an index on (user,id) rather than just (user) perhaps.
> >
> > Actually that index (with the same statistics setting as before)
> > yields slightly worse performance:
> >
> > # explain analyze select max(id) from user_messages where user_id = 13604;
> >  Total runtime: 0.128 ms
> >
> > Compare with the plain index on the one attribute:
> >
> > # explain analyze select max(id) from user_messages where user_id = 13604;
> >  Total runtime: 0.085 ms
>
> Ah, but:
> 1. Those times are so small, I'm not sure you can reliably separate
> them. Certainly not from one run.
> 2. For a range of different user-ids I'd expect user_id_id index to
> maintain a near-constant time regardless of the number of messages for
> that user.
> 3. You might be able to reduce your statistics on the user column and
> still keep the fast plan.

Actually, I wasn't looking at the time, but at the cost and estimated
number of rows, which are both lower for the original index, and the
complexity of the plan, which looks (at least to me) simpler than the
backwards scan.

But you're right. With the combined index I can set the granularity
back to 1000, and empty queries as well as non-empty queries perform
well. The row estimate is still way off, though.

What are the drawbacks of making the statistics buckets finer-grained?

Alexander.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster