Re: PostgreSQL 8.3.3 chooses wrong query plan when LIMIT 1 added?

Gregory Stark <stark@xxxxxxxxxxxxxxxx> · Tue, 28 Oct 2008 08:47:56 +0000

This looks like another form of the cross-column dependency problem. Postgres
is assuming that the revisions for all files will be evenly spread throughout
the date range and apparently there's a larger variety of dates than files so
it expects to find the last revision for that file fairly quickly scanning
backwards through the dates. 

In fact of course files tend to be hot for a period of time and then mostly
idle, so depending on which file you pick that may work well if it's currently
hot or be absolutely terrible if it's a file that hasn't been touched
recently.

With the LIMIT Postgres favours the plan it thinks will return one row quickly
without sorting. Without it it's favouring the plan that will return all the
rows for that file_id most quickly.

I'm not sure what to suggest for this case if you can't change the data model
except perhaps increasing the statistics target.

One thing that comes to mind though, I would have defined one of those two
indexes to include both columns. Probably the file_id index, so you would have
an index on <revision_id> and an index on <file_id,revision_id>. That would
be a huge win for this query.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general