Tables are analyzed, though I would love to find a way to increase it's accuracy of statistics Tried raising the statistics target upto 100, but it did not help. Should I bump it even more However I found that if I add depth to the group by clauses, it somehow tells the optimizer that it would get more than 1 row and it goes to a Hash Join .... For this query, only rows with one value of depth are accessed, so we are okay ... but I would like to see if there is some other way I can get a better approximation for the costs Sort (cost=25214.36..25214.39 rows=10 width=958) (actual time=9798.860..9815.670 rows=6115 loops=1) Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint)) ->Hash Left Join (cost=25213.83..25214.19 rows=10 width=958) (actual time=8526.248..9755.721 rows=6115 loops=1) Hash Cond: ((("outer".objsig)::text = ("inner".objsig)::text) AND (("outer".objtype)::text = ("inner".objtype)::text) AND (("outer".fieldname)::text = ("inner".fieldname)::text)) ->Subquery Scan o1 (cost=18993.48..18993.66 rows=10 width=990) (actual time=6059.880..6145.223 rows=6115 loops=1) ->HashAggregate (cost=18993.48..18993.56 rows=10 width=46) (actual time=6059.871..6094.897 rows=6115 loops=1) ->Nested Loop (cost=0.00..18993.22 rows=15 width=46) (actual time=45.510..5980.807 rows=6338 loops=1) ->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..18932.01 rows=10 width=19) (actual time=45.374..205.520 rows=6338 loops=1) Index Cond: ((heap_id = 1) AND (parentaddr = 0)) Filter: ((fieldname)::text = 'K'::text) ->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=43) (actual time=0.885..0.890 rows=1 loops=6338) Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr)) ->Hash (cost=6220.34..6220.34 rows=2 width=982) (actual time=2466.178..2466.178 rows=0 loops=1) ->Subquery Scan o2 (cost=6220.30..6220.34 rows=2 width=982) (actual time=2225.242..2433.744 rows=6038 loops=1) ->HashAggregate (cost=6220.30..6220.32 rows=2 width=46) (actual time=2225.233..2366.890 rows=6038 loops=1) ->Nested Loop (cost=0.00..6220.27 rows=2 width=46) (actual time=0.449..2149.257 rows=6259 loops=1) ->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..6202.89 rows=4 width=19) (actual time=0.296..51.310 rows=6259 loops=1) Index Cond: ((heap_id = 0) AND (parentaddr = 0)) Filter: ((fieldname)::text = 'K'::text) ->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=43) (actual time=0.294..0.300 rows=1 loops=6259) Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr)) Total runtime: 9950.192 ms Regards, Virag ----- Original Message ----- From: "Tom Lane" <tgl@xxxxxxxxxxxxx> To: "Virag Saksena" <virag@xxxxxxxxxxx> Cc: <pgsql-performance@xxxxxxxxxxxxxx> Sent: Monday, February 20, 2006 9:35 PM Subject: Re: [PERFORM] Cost Issue - How do I force a Hash Join > "Virag Saksena" <virag@xxxxxxxxxxx> writes: > > The individual queries run in 50-300 ms. However the optimizer is = > > choosing a nested loop to join them rather than a Hash join > > causing the complete query to take 500+ seconds. It expects that it will = > > get 1 row out from each of the sources, but here is gets > > several thousand rows. > > The best approach is to see if you can't fix that estimation error. > Are the stats up to date on these tables? If so, maybe raising the > statistics targets would help. > > regards, tom lane >