On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote: > Simon Riggs <simon@xxxxxxxxxxxxxxx> writes: > > I think its OK to use the MCV, but I have a problem with the current > > heuristics: they only work for randomly generated strings, since the > > selectivity goes down geometrically with length. > > We could certainly use a less aggressive curve for that. You got a > specific proposal? I read some research not too long ago that showed a frequency curve of words by syllable length. I'll dig that out tomorrow. > > I would favour the idea of dynamic sampling using a block sampling > > approach; that was a natural extension of improving ANALYZE also. > > One thing at a time please. Obtaining better statistics is one issue, > but the one at hand here is what to do given particular statistics. I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity. I specifically didn't mention that in the Ndistinct discussion because I didn't want to confuse the subject further, but the underlying block sampling method would be identical, so the code is already almost there...we just need to eval the RestrictInfo against the sampled tuples. Best Regards, Simon Riggs