Re: plan variations: join vs. exists vs. row comparison

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Mon, 7 Mar 2011 14:06:56 -0600

On Mon, Mar 7, 2011 at 2:00 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes:
>> I was hoping that somebody could help me understand the differences
>> between three plans.
>> All of the plans are updating a table using a second table, and should
>> be logically equivalent.
>> Two of the plans use joins, and one uses an exists subquery.
>> One of the plans uses row constructors and IS NOT DISTINCT FROM. It is
>> this plan which has really awful performance.
>> Clearly it is due to the nested loop, but why would the planner choose
>> that approach?
>
> IS NOT DISTINCT FROM pretty much disables all optimizations: it can't be
> an indexqual, merge join qual, or hash join qual. ÂSo it's not
> surprising that you get a sucky plan for it. ÂPossibly somebody will
> work on improving that someday.
>
> As for your other questions, what PG version are you using? ÂBecause I
> do get pretty much the same plan (modulo a plain join versus a semijoin)
> for the first two queries, when using 9.0 or later. ÂAnd the results of
> ANALYZE are only approximate, so you shouldn't be surprised at all if a
> rowcount estimate is off by a couple percent. ÂMost of the time, you
> should be happy if it's within a factor of 2 of reality.

Sorry - I had stated in the original post that I was using 8.4.5 on 64
bit openSUSE and CentOS 5.5, and had forgotten to carry that
information over into the second post.

What is the difference between a plain join and a semi join?

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance