On 06/25/2013 06:41 PM, Tom Lane wrote: >> Well, it was more in the form of: >> tab1.x = COALESCE(tab2.y,tab1.x) > > Hm. I'm not following how you get from there to complaining about not > being smart about X = X, because that surely ain't the same. Actually, it was dominated by defaultsel, since tab2.y had a nullfrac of 70%. It took us a couple days of reducing the bad query plan to figure out where the bad estimate was coming from. The real estimate should have been 0.7 + ( est. tab2.y = tab1.x ), but instead we were getting 0.005 + ( est. tab2.y = tab1.x ), which was throwing the whole query plan way off ... with an execution time difference of 900X. > It's a totally different case. In the join case you expect that each > element of one table will be compared with each element of the other. > In the single-table case, that's exactly *not* what will happen, and > I don't see how you get to anything very useful without knowing > something about the value pairs that actually occur. Sure you can. If you make the assumption that there is 0 correlation, then you can simply estimate the comparison as between two random columns. In the simplest approach, you would multiply the two ndistincts, so that a column with 3 values would match a column with 10 values 0.033 of the time. Now for a real estimator, we'd of course want to use the MCVs and the histogram to calculate a better estimation; obviously our 3X10 table is going to match 0% of the time if col1 is [1,2,3] and col2 contains values from 1000 to 1100. The MCVs would be particularly valuable here; if the same MCV appears in both columns, we can multiply the probabilities. To me, this seems just like estimating on a foreign table match, only simpler. Of course, a coefficient of corellation would make it even more accurate, but even without one we can arrive at much better estimates than defaultsel. > As a concrete > example, applying the join selectivity logic would certainly give a > completely wrong answer for X = X, unless there were only one value > occurring in the column. Yeah, I think we'll eventually need to special-case that one. In the particular case I ran across, though, using column match estimation would have still yielded a better result than defaultsel. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance