On 2020-06-24 16:27:35 -0600, Michael Lewis wrote: > On Wed, Jun 24, 2020, 2:35 PM Peter J. Holzer <hjp-pgsql@xxxxxx> wrote: > > Yes, estimating the number of distinct values from a relatively small > sample is hard when you don't know the underlying distribution. It might > be possible to analyze the sample to find the distribution and get a > better estimate. But I'm not sure how useful that would really be: If > a few values are very common and most very rare you are probably also > much more likely to use the common values in a query: And for those you > you would massively underestimate their frequency if you had an accurate > n_distinct value. That might be just as bad or even worse. > > > > This would only be true for values that are "common" but not in the MCVs list, > right? Yes, but if you have 33 million values there are likely to be a lot of them "common but not in the MCVs list", even for a very biased distribution. > If we could increase the sampling ratio beyond the hard coded 300x to get a > more representative sample I thought of that but abandoned it since I don't think a better estimate for n_distinct will help (see above for the reason). The problem is that the distribution is biased and the planner has no idea whether the value it is searching for is common or rare if it isn't in the MCV list. Unless ... As I understood Klaudie, the values are ids, and ids have no inherent meaning, the common values are probably scattered randomly. But it might be possible to change that. Group the ids by frequency. Ids < 1E12 occur at most 10 times, Ids >= 1E12 <2E12 occur at most 100 times and so on. This may mean that ids aren't long time stable - they may change as their frequency changes. But if an id always changes by a multiple of 1E12, the last 12 decimal digits are stable. The advantage is that then the planner can use the histogram to get a pretty good estimate of how frequent a value is. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature