On 5/24/06, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
<kynn@xxxxxxxxx> writes:
> Limit (cost=19676.75..21327.99 rows=6000 width=84)
> -> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84)
> Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
> -> Seq Scan on huge_table h (cost= 0.00..51292.43 rows=2525543 width=46)
> -> Hash (cost=19676.00..19676.00 rows=300 width=38)
> -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38)
Um, if huge_table is so much bigger than tiny_table, why are the cost
estimates for seqscanning them only about 2.5x different? There's
something wacko about your statistics, methinks.
You mean there's a bug in explain? I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables, which are exactly as I described. At any rate, how would one go about finding an explanation for these strange stats?
More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan, even though the tables have not changed at all. (Hence I can't post the explain analyze for the best query plan.) No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index. (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.)
Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed?
Thanks!
kj