On 26.3.2015 08:48, Jeff Janes wrote: > > OK, this is starting to look like a long-standing bug to me. > > If it only sees 3 distinct values, and all three are present at least > twice, it throws all of them into the MCV list. But if one of those 3 > were present just once, then it tests them to see if they qualify. > The test for inclusion is that it has to be present more than once, > and that it must be "over-represented" by 25%. > > Lets say it sampled 30000 rows and found 29,900 of one value, 99 of > another, and 1 of a third. > > But that turns into the second one needing to be present 12,500 times. > The average value is present 10,000 times (30,000 samples with 3 > distinct values) and 25 more than that is 12,500. So it excluded. > > It seems to me that a more reasonable criteria is that it must be > over-represented 25% compared to the average of all the remaining values > not yet accepted into the MCV list. I.e. all the greater ones should be > subtracted out before computing the over-representation threshold. That might work IMO, but maybe we should increase the coefficient a bit (say, from 1.25 to 2), not to produce needlessly long MCV lists. > It is also grossly inconsistent with the other behavior. If they are > "29900; 98; 2" then all three go into the MCV. Isn't the mincount still 12500? How could all three get into the MCV? -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance