> So the bottom line here is just that the estimated n_distinct is too > low. We've seen before that the equation we use tends to do that more > often than not. I doubt that consistently erring on the high side would > be better though :-(. Estimating n_distinct from a limited sample of > the population is known to be a statistically hard problem, so we'll > probably not ever have perfect answers, but doing better is on the > to-do list. > I hit an interestinhg paper on n_distinct calculation: http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf the PCSA algorithm described there requires O(1) calculation per value. Page 22 describes what to do with updates streams. This I think (disclaimer: I know little about PG internals) means that the n_distinct estimation can be done during vacuum time (it would play well with the visibility map addon). What do You think? Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance