Tom, I'm interested in the problem of cross-column statistics from a theoretical perspective. It would be interesting to sit down and try to reason out a useful solution, or at very least to understand the problem better so I can anticipate when it might come and eat me. >From my understanding, the main problem is that if PG knows the selectivity of n conditions C1,C2,...,Cn then it doesn't know whether the combined selectivity will be C1*C2*...*Cn (conditions are independent) or max(C1,C2,...,Cn) (conditions are strictly dependent), or somewhere in the middle. Therefore, row estimates could be orders of magnitude off. I suppose a common example would be a table with a serial primary key column and a timestamp value which is always inserted as CURRENT_TIMESTAMP, so the two columns are strongly correlated. If the planner guesses that 1% of the rows of the table will match pk>1000000, and 1% of the rows of the table will match timestamp > X, then it would be nice for it to know that if you specify both "pk>1000000 AND timestamp>X" that the combined selectivity is still only 1% and not 1% * 1% = 0.01%. As long as I'm sitting down and reasoning about the problem anyway, are there any other types of cases you're aware of where some form of cross- column statistics would be useful? In the unlikely event that I actually come up with a brilliant and simple solution, I'd at least like to make sure that I'm solving the right problem :) Thanks, Mark Lewis On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote: > Brian Herlihy <btherl@xxxxxxxxxxxx> writes: > > What would it take for hints to be added to postgres? > > A *whole lot* more thought and effort than has been expended on the > subject to date. > > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match