Tom Lane <tgl@xxxxxxxxxxxxx> writes: > "Ed L." <pgsql@xxxxxxxxxxxxx> writes: > > So, does this sound like we just happened to get repeatedly > > horribly unrepresentative random samples with stats target at > > 10? Are we at the mercy of randomness here? Or is there a > > better preventive procedure we can follow to systematically > > identify this kind of situation? > > I think the real issue is that stats target 10 is too small for large > tables: the samples are just not large enough to support a decent > numdistinct estimate, which is the critical stat for cases such as this > (ie, estimating the number of hits on a value that's not in the > most-common-values list). There's been some discussion on -hackers about this area. Sadly the idea of using samples to calculate numdistinct estimates is fundamentally on pretty shaky ground. Whereas a fixed sample size works fine for calculating distribution of values, in order to generate consistent precision for numdistinct estimates the samples will have to be a constant fraction of the table -- and unfortunately a pretty large fraction at that. So sadly I think "at the mercy of randomness" is pretty accurate. You'll have to raise the statistics target as the table grows and I expect you'll eventually run into some downsides of large stats targets. Some better algorithms were posted, but they would require full table scans during analyze, not just samples. -- greg