Re: Strange runtime partition pruning behaviour with 11.4

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 03 Aug 2019 12:05:53 -0400

Andreas Kretschmer <andreas@xxxxxxxxxxxxxxx> writes:
> Am 03.08.19 um 16:06 schrieb Thomas Kellerer:
>> But I'm more confused (or concerned) by the fact that the (original) 
>> query works correctly *without* statistics.

> can't reproduce that :-(  (PG 11.4 Community)

Yeah, I get the same plan with or without ANALYZE, too.  In this example,
having the ANALYZE stats barely moves the rowcount estimates for
foo_bar_baz at all, so it's not surprising that the plan doesn't change.
(I do wonder how Thomas got a different outcome...)

Given the shape of the preferred plan:

 Finalize Aggregate  (cost=15779.59..15779.60 rows=1 width=8) (actual time=160.329..160.330 rows=1 loops=1)
   ->  Gather  (cost=15779.38..15779.59 rows=2 width=8) (actual time=160.011..161.712 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=14779.38..14779.39 rows=1 width=8) (actual time=154.675..154.675 rows=1 loops=3)
               ->  Hash Join  (cost=1.09..14612.90 rows=66590 width=0) (actual time=86.814..144.793 rows=100500 loops=3)
                     Hash Cond: (fbb_1.foo_id = foo.foo_id)
                     ->  Parallel Append  (cost=0.00..12822.21 rows=399537 width=4) (actual time=0.019..95.644 rows=318950 loops=3)
                           ->  Parallel Seq Scan on foo_bar_baz_1 fbb_1  (cost=0.00..3403.53 rows=177353 width=4) (actual time=0.012..18.881 rows=100500 loops=3)
                           ->  Parallel Seq Scan on foo_bar_baz_2 fbb_2  (cost=0.00..3115.53 rows=162353 width=4) (actual time=0.018..51.716 rows=276000 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_3 fbb_3  (cost=0.00..2031.82 rows=105882 width=4) (actual time=0.011..16.854 rows=90000 loops=2)
                           ->  Parallel Seq Scan on foo_bar_baz_4 fbb_4  (cost=0.00..1584.00 rows=82500 width=4) (actual time=0.011..26.950 rows=140250 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_5 fbb_5  (cost=0.00..667.65 rows=34765 width=4) (actual time=0.014..11.896 rows=59100 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_0 fbb  (cost=0.00..22.00 rows=1200 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                     ->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=3)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=3)
                                 Filter: ((foo_name)::text = 'eeny'::text)
                                 Rows Removed by Filter: 5

it's obvious that no pruning can happen, run-time or otherwise,
because the partitioned table is being scanned on the outside
of the join --- so the target value of foo_id isn't available.

We can force the planner to its second best choice with
set enable_hashjoin to 0;

and then we get

 Aggregate  (cost=31954.09..31954.10 rows=1 width=8) (actual time=420.158..420.158 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..31554.55 rows=159815 width=0) (actual time=0.058..389.974 rows=301500 loops=1)
         Join Filter: (fbb.foo_id = foo.foo_id)
         Rows Removed by Join Filter: 655350
         ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1)
               Filter: ((foo_name)::text = 'eeny'::text)
               Rows Removed by Filter: 5
         ->  Append  (cost=0.00..19567.35 rows=958890 width=4) (actual time=0.026..280.510 rows=956850 loops=1)
               ->  Seq Scan on foo_bar_baz_0 fbb  (cost=0.00..30.40 rows=2040 width=4) (actual time=0.003..0.003 rows=0 loops=1)
               ->  Seq Scan on foo_bar_baz_1 fbb_1  (cost=0.00..4645.00 rows=301500 width=4) (actual time=0.022..57.836 rows=301500 loops=1)
               ->  Seq Scan on foo_bar_baz_2 fbb_2  (cost=0.00..4252.00 rows=276000 width=4) (actual time=0.019..51.834 rows=276000 loops=1)
               ->  Seq Scan on foo_bar_baz_3 fbb_3  (cost=0.00..2773.00 rows=180000 width=4) (actual time=0.016..31.951 rows=180000 loops=1)
               ->  Seq Scan on foo_bar_baz_4 fbb_4  (cost=0.00..2161.50 rows=140250 width=4) (actual time=0.015..24.392 rows=140250 loops=1)
               ->  Seq Scan on foo_bar_baz_5 fbb_5  (cost=0.00..911.00 rows=59100 width=4) (actual time=0.012..10.252 rows=59100 loops=1)

This is a good deal slower, and the planner correctly estimates that it's
a good deal slower, so that's why it didn't get picked.

But ... why didn't any run-time pruning happen?  Because the shape of the
plan is still wrong: the join condition is being applied at the nestloop
node.  If we'd pushed down the foo_id condition to the foo_bar_baz scan
then there'd be hope of pruning.

I think the reason that that isn't happening is that the planner has
not been taught that run-time pruning is a thing, so it's not giving
any cost preference to doing things in a way that would enable that.
It's not entirely clear what the cost estimate adjustments should be,
but obviously somebody had better work on that.

			regards, tom lane