Simon Riggs <simon@xxxxxxxxxxxxxxx> writes: > I think its OK to use the MCV, but I have a problem with the current > heuristics: they only work for randomly generated strings, since the > selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a specific proposal? >> After finishing that work it occurred to me that we could go a step >> further: if the MCV list accounts for a substantial fraction of the >> population, we could assume that the MCV list is representative of the >> whole population, and extrapolate the pattern's selectivity over the MCV >> list to the whole population instead of using the existing heuristics at >> all. In a situation like Andreas' example this would win big, although >> you can certainly imagine cases where it would lose too. > I don't think that can be inferred with any confidence, unless a large > proportion of the MCV list were itself selected. Otherwise it might > match only a single MCV that just happens to have a high proportion, > then we assume all others have the same proportion. Well, of course it can't be inferred "with confidence". Sometimes you'll win and sometimes you'll lose. The question is, is this a better heuristic than what we use otherwise? The current estimate for non-anchored patterns is really pretty crummy, and even with a less aggressive length-vs-selectivity curve it's not going to be great. Another possibility is to merge the two estimates somehow. > I would favour the idea of dynamic sampling using a block sampling > approach; that was a natural extension of improving ANALYZE also. One thing at a time please. Obtaining better statistics is one issue, but the one at hand here is what to do given particular statistics. regards, tom lane