Re: Planning performance problem (67626.278ms)

Manuel Weitzman <manuelweitzman@xxxxxxxxx> · Tue, 29 Jun 2021 17:35:38 -0400

> On 29-06-2021, at 15:43, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> 
> Manuel Weitzman <manuelweitzman@xxxxxxxxx> writes:
>>> On 20-06-2021, at 17:06, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>>> So ... the reason why there's not caching of get_actual_variable_range
>>> results already is that I'd supposed it wouldn't be necessary given
>>> the caching of selectivity estimates that happens at the RestrictInfo
>>> level.  I don't have any objection in principle to adding another
>>> caching layer if that one's not working well enough, but I think it'd
>>> be wise to first understand why it's needed.
> 
>> For what I could make out from the code, the caching done at the
>> RestrictInfo level is already saving a lot of work, but there's a
>> different RestrictInfo instance for each alternative path created by
>> make_one_rel().
> 
> That seems a bit broken; a given WHERE clause should produce only one
> RestrictInfo.  Can you provide a more concrete example?
> 

I added some logging to see hits and misses on cached_scansel() for
this query
> explain (analyze, buffers)
> select * from a
> join b b1 on (b1.a = a.a)
> join b b2 on (b2.a = a.a)
> where b1.a in (1,100,10000,1000000,1000001);

Apparently  there's a RestrictInfo for each possible way of doing merge
join (are those created dynamically for planning?), for example:
- a join (b1 join b2)
- b1 join (a join b2)
- b2 join (a join b1)

When the cost of a possible mergejoin path hasn't been computed yet,
then mergejoinscansel() would have to check the bloated index again.

I attached a patch so you can see the hits and misses on cached_scansel().
Each time there's a miss logged, there's also a different RestrictInfo
pointer involved.

Best regards,
Manuel
Attachment:
cached_scansel_hitmiss.patch

Description: Binary data