On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote: > El-Lotso <el.lotso@xxxxxxxxx> writes: > > sorry.. I sent this as I was about to go to bed and the explain analyse > > of the query w/ 4 tables joined per subquery came out. > > It's those factor-of-1000 misestimates of the join sizes that are > killing you, eg this one: > > > -> Hash Join (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1) > > Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp = test_db.trd.start_timestamp) AND (test_db.ts.ttype = test_db.trd.ttype)) > > -> Seq Scan on ts (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244 loops=1) > > -> Hash (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1) > > The single-row-result estimate persuades it to use a nestloop at the > next level up, and then when the output is actually 969 rows, that > means 969 executions of the other side of the upper join. Yep.. that's consistent with the larger results output. more rows = more loops > > The two input size estimates are reasonably close to reality, so > the problem seems to be in the estimate of selectivity of the > join condition. First off, do you have up-to-date statistics > for all the columns being joined here? It might be that > increasing the statistics targets for those columns would help. I've already upped the stats level to 1000, reindex, vacuum, analysed etc but nothing has basically changed. The issue here is mainly because for each id, there is between 2 to 8 hid. eg: table d seq : 1234567 / code : CED89 table trh seq : 123456 hid : 0/1/2/3/4/5/6/7 and the prob is also compounded by the different ttypes available which causes the use of the subqueries. end of the day.. this data output is desired ID HID =========== 1234567 |0 1234567 |1 1234567 |2 1234567 |3 1234567 |4 1234567 |5 1234567 |6 1234567 |7 the d table has the unique id whereas the other tables has all the subsets. Like a family tree.. Starts at 2, (mom/pop) then to children + children's grandchildren (pair1) children's grandchildren(pair2) d to trh is a one to many relationship > But what I'm a bit worried about is the idea that the join > conditions are correlated or even outright redundant; the > planner will not know that, and will make an unrealistic > estimate of their combined selectivity. If that's the > case, you might need to redesign the table schema to > eliminate the redundancy before you'll get good plans. I'm not I understand (actually, i don't) the above comment. I've already made then from subqueries to actual joins (collapse it) and still no dice. btw, this same schema runs fine on SQL server. (which I'm pulling data from and pumping into PG) I'm downgrading to 8.1.9 to see if it helps too. appreciate any pointers at all. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings