Tom Lane wrote:
The right way to do it is to adjust the planner cost parameters.
The standard values of those are set on the assumption of
tables-much-bigger-than-memory, a situation in which the planner's
preferred plan probably would be the best. What you are testing here
is most likely a situation in which the whole of both tables fits in
RAM. If that pretty much describes your production situation too,
then you should decrease seq_page_cost and random_page_cost. I find
setting them both to 0.1 produces estimates that are more nearly in
line with true costs for all-in-RAM situations.
I know I can do it by adjusting cost parameters, but I was really
curious as to why adding a "LIMIT 5000" onto a SELECT from a table with
only 530 rows in it would affect matters at all. The plan the planner
uses when LIMIT 5000 is on is the one I want, without adjusting any
performance costs. It doesn't seem to matter what the limit is -- LIMIT
99999 also produces the desired plan, whereas no LIMIT produces the
undesirable plan.
--Colin McGuigan