On Tue, 26 Nov 2024 at 09:55, Ba Jinsheng <bajinsheng@xxxxxxxxx> wrote: > TPC-DS query 95: > Its execution time is nearly 1 min: > CTE ws_wh > -> Hash Join (cost=37772.14..198810.77 rows=7242361 width=12) (actual time=211.161..1443.926 rows=6644004 loops=1) > If applying this patch: > - if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel, > + if (!is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel, > The execution time is reduced to 6 seconds: > CTE ws_wh > -> Hash Join (cost=37772.14..74062.53 rows=7095248 width=12) (actual time=203.407..560.264 rows=719205 loops=1) > The difference between both query plans is the second one uses Materialize instead of Memoize. From the code, it seems that changing the usage of the cache brings performance improvement unexpectedly. What's going on here is that you've introduced a new bug which has the Hash Join effectively perform a semi-join rather than an inner join so that it only joins the first matching inner-side row instead of all matching inner-side rows for each outer-side row. The bug is fairly evident when to see that the Hash Join produces 6644004 rows without your change and only 719205 rows after you added the bug. I'm not sure what you expect us to do here, but just in case you're not aware, we're not going to prioritise performance over correct results. I expect the problem you've introduced will be more evident if you use EXPLAIN ANALYZE VERBOSE. The verbose option will show which joins are marked "Inner Unique". If you want to learn more about the optimisation you've broken, see [1]. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c7f5229ad68d7e0e4dd149e3f80257893e404d4