Re: What is the right way to deal with a table with rows that are not in a random order?

Douglas Alan <darkwater42@xxxxxxxxx> · Wed, 27 May 2009 22:43:20 -0400

On Wed, May 27, 2009 at 8:54 PM, Jeff Davis <pgsql@xxxxxxxxxxx> wrote:

If you're putting a LIMIT on it, why does it return millions of results?

It doesn't return millions of results with the LIMIT on it.  It just does a sequential scan of the table and doesn't find any results until it gets to the last quarter of the table. Sequentially scanning through 3/4 of the huge table before it gets a single match takes a very long time.

As I said, in my original post, Postgres's approach would be completely reasonable in this case, if the rows that it was looking for were sprinkled randomly throughout the table.  But they're not in this case -- they're all at the end.

Can you pick out an interesting query and give some specifics, like:

* the query

* the EXPLAIN ANALYZE output (or EXPLAIN without ANALYZE if it takes too

long to even run once)

* EXPLAIN ANALYZE output if you force the index scan

* the statistics for the relevant columns, such as histogram and

correlation

As I mentioned, the situation is very simple, and easy to understand what is going on.  There's absolutely no mystery as to why Postgres is doing what it's doing.  25% of the table matches the query.  Postgres knows this due to the statistics histogram for the column.  Postgres is deciding to do a sequential scan because it knows that 25% of the rows match the query.

Unfortunately, in this case, that's a poor approach.

|>ouglas