Re: demystifying nested loop vs. merge join query plan choice

Jeff Janes <jeff.janes@xxxxxxxxx> · Fri, 2 Aug 2013 12:08:47 -0700

On Thu, Aug 1, 2013 at 4:50 PM, BladeOfLight16 <bladeoflight16@xxxxxxxxx> wrote:
> On Thu, Aug 1, 2013 at 10:25 AM, Sandeep Gupta <gupta.sandeep@xxxxxxxxx>
> wrote:
>>
>> @Jeff : Thanks for pointing this out. Turns out that was the case.
>>
>> @Tom: Thank you for the reference to random_page_cost parameters. It would
>> be very useful for us. Would go through the rest of the documentation as
>> well.
>
>
> I can't say what Jeff mentioned; maybe he didn't reply to the user list.
> Anyhow, sorry if this is repeating information.

I see that I accidentally didn't reply on list, Sorry.  I had just
pointed out that the tables are in vastly different vacuum states
between instances, based on the different heap fetches needed for the
IOS.  (Presumably this means the rest of the stats used for estimates
are all out of tune as well)

>
> I cannot help but point something glaring out in the EXPLAIN, though:
>
> database 1:
>
> Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32 rows=49987616
> width=4)
>
> database 2:
>
> Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44 rows=3 width=4)
>
> Maybe I just don't know how to read EXPLAIN plans, but it would appear that
> the estimated rows from the index only scan in the two plans is different by
> a factor of about 16.7 million.

The IOS on database 2 is inside a nested loop, and the whole thing is
executed 500,384 times.  So the error is only a factor of 30, not a
factor of 16 million.

> database 1 also processes about 7.7 million
> rows before the aggregate,

It thinks it will process 7.7 million, it actually processes less than
1 million.

> where database 2 only processes about 1.3
> million. For some reason, it appears that database 2 is able to eliminate
> far more rows more quickly, resulting in a faster query. Have both databases
> had VACUUM ANALYZE run on them? Are the statistics collection settings the
> same?

Yeah, that is the key.  I'm not sure what Sandeep meant by "that was
the case"--it looks like the one with the freshest stats was the one
that was using the slower plan, so hopefully the problem was not fixed
by converting the fast plan to look like the slow one!

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general