Re: DIfferent plans for explicit versus implicit join using link table

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 26 Apr 2007 16:44:22 -0400

"John D. Burger" <john@xxxxxxxxx> writes:
> I have two queries for looking up related words which I think should  
> be equivalent, but 7.4.8 comes up with very different plans.

They're not at all equivalent:

> explain analyze select w2.word from allwords w1 join allwordrelations  
> as r on (w1.wordid = r.word1id) join allwords w2 on (w2.wordid =  
> r.word2id) where w1.word = 'dogging';

> explain analyze select w2.word from allwords w1, allwords w2 where  
> (w1.wordid, w2.wordid) in (select word1id, word2id from  
> allwordrelations ) and w1.word = 'dogging';

If there are duplicate word1id,word2id entries in allwordrelations, the
first query will produce duplicate outputs; the second will not.

If there were a unique constraint on (word1id, word2id), in theory
the planner could prove that the IN form could be simplified to a plain
join, but there is no such logic in HEAD let alone 7.4, and in any case
you've not got such a constraint.

The plan that gets chosen is to forcibly unique-ify the (word1id,
word2id) data (via a "sort | uniq"-like pipeline) and then do a normal
join with that.  Which is expensive because allwordrelations is big.
But the alternative is probably even worse: without that
allwordrelations has to be joined to w1 and w2 simultaneously, meaning
that the unconstrained cartesian product of w1 and w2 has to be formed
first.

			regards, tom lane