Thanks, Tom. I had discarded the possibility of data type mismatch already, which was your first guess, but was wondering if the lopsided distribution of location values would lead the planner to make a decision that is good on average but bad for this particular query, as you point out in your second guess.
I'll try populating the test users with a more evenly distributed location field, which will be more realistic anyway, and see if that works out better.
BTW, the -1 is not really a dummy value, but it's just a value that we have been using in tests for "fake test location ID". I just started performance measurement for my application and so far had measured performance with every user being in the same default location and things seemed to be going well, so I tried to switch a couple users to a different location and see what happened, and that made performance drop significantly.
(even more detail: my queries also limit results to 10 approx, so DB quickly found 10 rows that match location -1, but it took a while to discover there weren't more than 2 rows with the other value).
Thanks!
Eddy
On Sun, Nov 15, 2009 at 3:33 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Eddy Escardo-Raffo <eescardo@xxxxxxxxxx> writes:> The table used in this query is called "users", and it has columns "userid"Oh, after poking at it a bit more, I realize the problem: the planner
> (primary key) and "location".
> The "location" column is indexed.
> The users table has 1 million rows, and all rows have integer typed value
> '-1' for "location" column, except for 2 rows that have the integer value
> '76543'.
doesn't want to use an indexscan because it assumes there's a
significant probability that the search will be for -1 (in which case
the indexscan would be slower than a seqscan, as indeed your results
prove). Even though it could know in this particular case that the
comparison value isn't -1, I doubt that teaching it that would help your
real queries where it will probably be impossible to determine the
comparison values in advance.
I would suggest considering using NULL rather than inventing a dummy
value for unknown locations. The estimation heuristics will play a
lot nicer with that choice.
regards, tom lane