Kouber Saparev <kouber@xxxxxxxxxxx> writes: > Tom Lane wrote: >> Hmph, that's still not real good. Ideally it should be estimating >> *less* than the average frequency, because the estimate is made after >> excluding all the most-common-values, which evidently 'kouber' is not >> one of. > I altered the statistics for that column to 1000, so now the planner > assumes exactly 492 rows for the fore-mentioned query, which is indeed > the average. It never went *less* than that value, it was always higher, > i.e. for a statistics value of 600, it was 588, for 800, it became 540. I got some time to think more about this and experiment a bit further. As far as I can tell there is no fundamental bug here --- given reasonably accurate stats the rowcount estimate behaves as expected, ie, you get an estimate that's less than the actual average number of values if the target value is not one of the known MCVs. However, as the n_distinct estimate falls below the actual number of distinct values, that rowcount estimate necessarily rises. What had surprised me about this report is that the estimate matched the true average number of rows so closely; I wondered if there was some property of the way we estimate n_distinct that would make that happen. But I now think that that was just chance: there doesn't seem to be any underlying behavior that would cause it. I did some experiments with random data matching a Zipfian distribution (1/k law) and did not observe that the rowcount estimate converged to the true average when the n_distinct value was too low. So the bottom line here is just that the estimated n_distinct is too low. We've seen before that the equation we use tends to do that more often than not. I doubt that consistently erring on the high side would be better though :-(. Estimating n_distinct from a limited sample of the population is known to be a statistically hard problem, so we'll probably not ever have perfect answers, but doing better is on the to-do list. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance