On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote: > Yes, it's only intractable if you're wedded to the idea of a tiny, > fixed-size sample. If we're allowed to sample, say, 1% of the table, we > can get a MUCH more accurate n_distinct estimate using multiple > algorithms, of which HLL is one. While n_distinct will still have some > variance, it'll be over a much smaller range. I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance