Maxim Boguk <maxim.boguk@xxxxxxxxx> writes: > [ planner changes behavior when a VALUES RTE reaches 200 elements ] The immediate cause of that is that, lacking any real statistics for the VALUES RTE, eqjoinsel_semi() will fall back to a rather dubious default estimate if it believes it's looking at a default estimate for the number of distinct entries on either side of the join clause: /* * Without MCV lists for both sides, we can only use the heuristic * about nd1 vs nd2. */ double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0; if (!isdefault1 && !isdefault2) { if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); } else ===> selec = 0.5 * (1.0 - nullfrac1); And get_variable_numdistinct() changes its mind about whether it's issuing a default estimate when the VALUES size reaches DEFAULT_NUM_DISTINCT (200): /* * With no data, estimate ndistinct = ntuples if the table is small, else * use default. We use DEFAULT_NUM_DISTINCT as the cutoff for "small" so * that the behavior isn't discontinuous. */ if (ntuples < DEFAULT_NUM_DISTINCT) return clamp_row_est(ntuples); *isdefault = true; return DEFAULT_NUM_DISTINCT; So basically, although this is alleged to be continuous behavior, the changeover from isdefault = false to isdefault = true causes a huge change in the result from eqjoinsel_semi. There are a couple of things we might choose to do about this: 1. get_variable_numdistinct doesn't currently pay any attention to what kind of RTE it's considering, but it does have access to that. We could teach it to assume that if the Var comes from a VALUES RTE, the values are all distinct regardless of the length of the VALUES list. That's effectively what it assumes for all VALUES of < 200 elements today, and it's not apparent why we shouldn't make the same assumption for longer lists. Or we could adopt some sort of nonlinear behavior that gradually reduces the assumed stadistinct fraction, but I don't know any rule for that that wouldn't be pretty ad-hoc. 2. We could change the default don't-know-anything selectivity estimate in eqjoinsel_semi to be something less crude than 0.5. But again, it's hard to say what to use instead. The first of these ideas might be something that would be sane to back-patch, but I'd be pretty hesitant about back-patching anything along the lines of #2; the scope of the effects is hard to predict. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general