But the two queries don't return the same results. Of course the
second one will be faster.
The equivalent of your first query is to take the result sets from
these two queries
(...)
it's not
too surprising that the planner can't come up with the optimal
plan; you've posed quite a challenge for it.
The point that i was trying to make by doing 2 queries and unioning them is, that it is faster to use 2 index scans than to use sequential scans.
I can't quite recognize the challenge that i'm posing the query planner, but i am willing/hoping to learn more about it.
AFAIK, the planner has some statistics about the frequencies in which values in the columns occur. That way, it can calculate the approx number of records that will have to be fetched and considering the latency of a rotating hard disk, it can calculate what is likely to be faster: a sequential scan or using the index for random reads.
In this case, the planner can calculate the number of records that need to be fetched from B, in my case it says it expects 4 of them in both cases. Combined, it would fetch max 8 records from B, in contrast to 40K or even twice that.
I can't understand what is confusing the planner.
Cheers,
Willy-Bas