On 3/22/22 13:57, Prajna Shetty wrote: > ++ _pgsql-performance@postgresql.org_ > <https://www.postgresql.org/list/pgsql-performance/> > > Hello Team, > > There is change in query plan in 12.4 version and Version 13 resulting > in performance slowness post upgrade. > > > * In 12.4 version, Sort Operation Group Aggregate is selected which > results to Merge Join. Query takes ~5 seconds. > * In 13.5 version, optimizer wrongly estimates and due to new Disk > Based Hash Aggregate feature, it prefers Hash Aggregate instead of > Sort Operation which finally blocks merge-join and chooses Nested > Loop Left Join. Query takes ~5 minutes. > > > *_NOTE: _*Disabling Hash Aggregate on instance level forces optimizer to > choose merge operation but such instance level modification is not > possible in terms of Application Functionality. > > This performance issue is on all over most of queries. Attached one of > the query and its plan in both version for reference in case that helps > for recreating the issue. > It's impossible to comment those other queries, but chances are the root cause is the same. > Version 13 query plan has lower estimated cost than that of 12.4 which > implies 13.5 planner thought it found a better plan, but it is running > slower and actual cost show more. > > 12.4 Version: > "Merge Right Join (cost=*202198.78..295729.10* rows=1 width=8) (actual > time=1399.727..*5224.574* rows=296 loops=1)" > > 13.5 version:- > "Nested Loop Left Join (cost=*196360.90..287890.45* rows=1 width=8) > (actual time=3209.577..*371300.693* rows=296 loops=1)" > This is not a costing issue, the problem is that we expect 1 row and calculate the cost for that, but then get 296. And unfortunately a nested loop degrades much faster than a merge join. I'm not sure why exactly 12.4 picked a merge join, chances are the costing formular changed a bit somewhere. But as I said, the problem is in bogus row cardinality estimates - 12.4 is simply lucky. The problem most likely stems from this part: -> GroupAggregate (cost=0.43..85743.24 rows=1830 width=72) (actual time=1.621..3452.034 rows=282179 loops=3) Group Key: student_class_detail.aamc_id Filter: (max((student_class_detail.class_level_cd)::text) = '4'::text) Rows Removed by Filter: 76060 -> Index Scan using uk_student_class_detail_aamcid_classlevelcd on student_class_detail (cost=0.43..74747.61 rows=1284079 width=6) (actual time=1.570..2723.014 rows=1272390 loops=3) Filter: (class_level_start_dt IS NOT NULL) Rows Removed by Filter: 160402 The filter is bound to be misestimated, and the error then snowballs. Try replacing this part with a temporary table (with pre-aggregated results) - you can run analyze on it, etc. I'd bet that'll make the issue go away. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company