Re: Unexpected sequential scan on an indexed column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Tom. I had discarded the possibility of data type mismatch already, which was your first guess, but was wondering if the lopsided distribution of location values would lead the planner to make a decision that is good on average but bad for this particular query, as you point out in your second guess.
 
I'll try populating the test users with a more evenly distributed location field, which will be more realistic anyway, and see if that works out better.
 
BTW, the -1 is not really a dummy value, but it's just a value that we have been using in tests for "fake test location ID". I just started performance measurement for my application and so far had measured performance with every user being in the same default location and things seemed to be going well, so I tried to switch a couple users to a different location and see what happened, and that made performance drop significantly.
(even more detail: my queries also limit results to 10 approx, so DB quickly found 10 rows that match location -1, but it took a while to discover there weren't more than 2 rows with the other value).
 
Thanks!
Eddy

On Sun, Nov 15, 2009 at 3:33 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Eddy Escardo-Raffo <eescardo@xxxxxxxxxx> writes:
> The table used in this query is called "users", and it has columns "userid"
> (primary key) and "location".
> The "location" column is indexed.
> The users table has 1 million rows, and all rows have integer typed value
> '-1' for  "location" column, except for 2 rows that have the integer value
> '76543'.

Oh, after poking at it a bit more, I realize the problem: the planner
doesn't want to use an indexscan because it assumes there's a
significant probability that the search will be for -1 (in which case
the indexscan would be slower than a seqscan, as indeed your results
prove).  Even though it could know in this particular case that the
comparison value isn't -1, I doubt that teaching it that would help your
real queries where it will probably be impossible to determine the
comparison values in advance.

I would suggest considering using NULL rather than inventing a dummy
value for unknown locations.  The estimation heuristics will play a
lot nicer with that choice.

                       regards, tom lane


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux