Re: Hash Right join and seq scan

David Rowley <dgrowleyml@xxxxxxxxx> · Sat, 6 Jul 2024 12:32:47 +1200

On Sat, 6 Jul 2024 at 02:43, James Pang <jamespang886@xxxxxxxxx> wrote:
>            for nest loop path, since the first one estimated only "8" rows , and they use partitionkeyid as joinkey and all are hash partitions , is it better to estimate cost to  8 (loop times) * 1600 = 12800 (each one loop  map to only 1 hash partition bitmap scan ,avg one partition cost), that's much less than 398917.29 of all partitions ?

I'm not really sure where you're getting the numbers from here. The
outer side of the deepest nested loop has an 8 row estimate, not the
nested loop itself.  I'm unsure where the 1600 is from. I only see
1669.

As of now, we don't do a great job of costing for partition pruning
that will happen during execution.  We won't be inventing anything to
fix that in existing releases of PostgreSQL, so you'll need to either
adjust the code yourself, or find a workaround.

You've not shown us your schema, but perhaps enable_partitionwise_join
= on might help you. Other things that might help are further lowering
random_page_cost or raising effective_cache_size artificially high.
It's hard to tell from here how much random I/O is being costed into
the index scans.  You could determine this by checking if the nested
loop plan costs change as a result of doing further increases to
effective_cache_size. You could maybe nudge it up enough for it to win
over the hash join plan. It is possible that this won't work, however.

>  for secondary Nest Loop Anti join could be rows 299118 rows *  15.78(avg index scan cost of one partition) = 4,720,082 that still much less than  132168227.57 ?
>          for Hash Right join, is it possible to estimate by  8 seq partition scan instead of all 32 hash partitions since the first query estimated 8 rows only ?
>          extend statistics may help estimate count(partitionkeyid) based on other columns bind variables, but looks like that did not help table join case.

I can't quite follow this. You'll need to better explain where you're
getting these numbers for me to be able to understand.

David