2008/1/30, Tom Lane <tgl@xxxxxxxxxxxxx>: > "Dmitry Potapov" <fortune.fish@xxxxxxxxx> writes: > > 2008/1/25, Tom Lane <tgl@xxxxxxxxxxxxx>: > >> It's hard to tell whether the planner is just being overoptimistic > >> about the results of ANDing the three join conditions, or if one or > >> more of the basic condition selectivities were misestimated. > > > I wonder if it will ever finish the work :( So, for now, the only > > thing I can show is: > > > =# explain select 1 from t1, t2 where t1.m1=t2.m1 ; > > QUERY PLAN > > -------------------------------------------------------------------------------- > > Nested Loop (cost=0.00..162742993234.25 rows=57462242912003 width=0) > > -> Seq Scan on t2 (cost=0.00..690784.54 rows=30820054 width=4) > > -> Index Scan using i_t1_m1 on t1 (cost=0.00..3080.74 rows=175973 width=4) > > Index Cond: (t1.m1 = t2.m1) > > (4 rows) > > So this is estimating the join selectivity for m1 alone as > > regression=# select 57462242912003 / (30820054.0 * 21292688); > ?column? > ------------------------ > 0.08756260792962293676 > (1 row) > > which would suggest that there are only 11 or so distinct values for m1; > is that about right? No, that's wrong. There's 155 distinct values in t1, and 157 in t2. By the way, due to the nature of the data stored in m1 (see below), the selectivity for m1, imho, should be bigger than 0.08756, something about 0,80 or so. Unfortunally I can't be sure about that, the query in question is still running, so the real rowcount is still not available :( > There's something a bit funny here --- looking at the plan components, > the output row estimate should be about 30820054 * 175973, but it's ten > times that. I'm not sure if this is just a presentation artifact or > if it points to a real bug. That looks strange, indeed. > Could you show us the pg_stats rows for t1.m1 and t2.m1? I've attached them as text files to one of my previous letters. I'm sure that email formatting will kill their readability. I attached them once again to this letter, pgstats_t1_m,txt is for t1, pgstats_t2_m.txt for t2. To save your time, some fields: t1.m1: n_distinct=121 correlation=0.0910695 t2.m2: n_distinct=111 correlation=0.148101 > The m2 estimate seems nearly dead on, and the m3 estimate is within 10% > or so which is certainly close enough. Barring the possibility that > there's something wrong underneath that m1 estimate, the products of > the selectivities do indeed work out to give us about 1 row out of the > join with all three conditions. So my conclusion is that these three > values are not independent, but are highly correlated (in fact, it seems > that m2 alone provides most of the join selectivity). Do you actually So, about the nature of the data in m1,m2,m3. m1 is something like subsystem ID to which the row belongs, m3 is supposed to be unique record id within that subsystem, but it starts with 0 when the system is restarted. m2 is a timestamp in java format, it is there to guarantee the uniqueness of (m1,m2,m3) triplet. So, about corelation, m2 and m3 may seem to be correlated: if a system inserts records at nearly constant rate, m2=a+b*m3.. > need to compare all three to make the join, or is one field perhaps > dependent on the other two? Can you refactor the data to make it so? yes, I absolutely need to compare all three. about refactoring, the only thing I can think out is to put m1,m2,m3 to a separate table, with bigint primary key, and then to tie t1 and t2 with that key. but that will require changes in the application and some amount of time to convert the data :( -- Regards, Dmitry ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend