Re: Generic Q about max(id) vs ORDER BY ID DESC LIMIT 1

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Mon, 24 Oct 2005 17:32:29 -0500

On Mon, 2005-10-24 at 16:57, felix@xxxxxxxxxxx wrote:
> Having been surprised a few times myself by EXPLAIN showing a
> sequential scan instead of using an index, and having seen so many
> others surprised by it, I hope I am not asking a similar question.
> 
> We recently upgraded our db servers, both old and new running 8.0, and
> one casualty was forgetting to add the nightly VACUUM ANALYZE.
> Inserts were down to 7-8 seconds apiece, but are now back to normal
> under a second since the tables were vacuumed.
> 
> However, in the process of investigating this, my boss found something
> which we do not understand.  A table with a primary key 'id' takes 200
> seconds to SELECT MAX(id), but is as close to instantaneous as you'd
> want for SELECT ID ORDER BY ID DESC LIMIT 1.  I understand why
> count(*) has to traverse all records, but why does MAX have to?  This
> table has about 750,000 rows, rather puny.
> 
> I suspect there is either a FAQ which I missed, or no one can answer
> without EXPLAIN printouts.  I'm hoping there is some generic answer to
> something simple I have overlooked.

It may be, but it's a complex enough question, I'll toss out the answer,
as I understand it.

The problems with aggregates extends from two design decisions made in
PostgreSQL.  

1:  Generic aggregation

PostgreSQL uses a generic aggregation system.  I.e. there ain't no short
cuts in the query planner for one or another of the aggregates.  If you
make an aggregate function called std_dev() and implement it, it pretty
much gets treated the same by the query planner as the ones that are
built in.

So, to the query planner, max(fieldname) is about the same as
sum(fieldname).

Now, you and I can tell by looking at them that one of those could be
answered pretty quickly with an index, but how's the planner supposed to
know?

2:  MVCC

PostgreSQL's implementation of an "in store" Multi-version Concurrency
Control system means that indexes only tell you where the index entry is
in the table, not whether or not it is visible to this particular
transaction.  Depending on when the tuple was last updated, and when our
transactions started, and our transaction isolation level, a given
version of a given tuple may or may not be visible to our transaction.

So, while PostgreSQL can use an index to find things, it always has to
go back to the actual table to find out if the value is visible to the
current transaction.

Put simply, reads cost more, but writes don't make the performance of
the db plummet.

Put more simply, everyone gets a medium slow read performance for
certain ops, but writes keep streaming right along.    The Ain't No Such
Thing As A Free Lunch (TANSTAAFL).

Notice that you CAN use an index for most aggregates, as long as the
where clause you're using is limiting enough.

select count(*) from sometable where somefield > 22 can use an index if
somefield > 22 is selective enough.  But again, if the db is going to
read some base percentage of the pages in the main table, it's cheaper
to just switch to a sequential scan than to use the index, since it HAS
TO READ all the values in the table to see if they're visible.

The good news is that generally speaking, everything else in PostgreSQL
is quite fast, and parallelism is very good.

Hope that's not too involved, and Tom, hope I didn't make too many
mistakes there...

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly