On Mon, Jan 09, 2006 at 09:04:48PM -0500, Tom Lane wrote: > Andrea Arcangeli <andrea@xxxxxxxxxxxx> writes: > > It just makes no sense to me that the planner takes a difference > > decision based on a "not". > > Why in the world would you think that? In general a NOT will change the > selectivity of the WHERE condition tremendously. If the planner weren't > sensitive to that, *that* would be a bug. The only case where it's > irrelevant is if the selectivity of the base condition is exactly 50%, > which is not a very reasonable default guess for LIKE. How do you know that "LIKE" will have a selectivity above 50% in the first place? I think 50% should be the default unless the selectively is measured at runtime against the data being queried. If you don't know the data, I think it's a bug that LIKE is assumed to have a selectivity above 50%. You can't know that, only the author of the code can know that and that's why I talked about hints. It'd be fine to give hints like: UNLIKELY string LIKE '%% PREEMPT %%' or: LIKELY string NOT LIKE '%% PREEMPT %%' Then you could assume that very little data will be returned or a lot of data will be returned. If you don't get hints NOT LIKE or LIKE should be assumed to have the same selectivity. > It sounds to me that the problem is misestimation of the selectivity > of the LIKE pattern --- the planner is going to think that > LIKE '%% PREEMPT %%' is fairly selective because of the rather long > match text, when in reality it's probably not so selective on your > data. But we don't keep any statistics that would allow the actual True, there's a lot of data that matches %% PREEMPT %% (even if less than the NOT case). > number of matching rows to be estimated well. You might want to think > about changing your data representation so that the pattern-match can be > replaced by a boolean column, or some such, so that the existing > statistics code can make a more reasonable estimate how many rows are > selected. I see. I can certainly fix it by stopping using LIKE. But IMHO this remains a bug, since until the statistics about the numberof matching rows isn't estimated well, you should not make assumptions on LIKE/NOT LIKE. I think you can change the code in a way that I won't have to touch anything, and this will lead to fewer surprises in the future IMHO. Thanks!