Tomas Vondra <tv@xxxxxxxx> writes: > Estimate for condition > ... WHERE (col_a BETWEEN 33 AND 33) > is about 10k rows, which is quite precise. On the other side estimate > for condition > ... WHERE (col_b BETWEEN 33 AND 33) > is 1 row, which is very imprecise (actual value is about 5000). That's an artifact of your test case. There are exactly 100 distinct values in col_a, which means that they all fit into the most_common_vals array (assuming you're using default_statistics_target = 100). So the planner actually has complete information about the contents of col_a, modulo some sampling inaccuracy about the precise frequency of each value. It should be expected to produce pretty good estimates for the sorts of expressions it can estimate, and it does. In col_b, there are 200 values, so they can't be represented by most_common_vals, and in fact ANALYZE notices that none of them are really much more common than any other. So it throws up its hands and doesn't generate an MCV list at all, just a histogram. That doesn't provide a lot of foothold for the range estimator to give an exact estimate for a very narrow range. > I've been playing with this a little bit, and I've noticed another > 'strange' thing. When I rewrite the condition like this > ... WHERE (col_b BETWEEN 32.9 AND 33.1) > so that the range is not of zero length, and everything works fine, the > estimate is exactly 5000. But when I increase the lower bound a bit > ... WHERE (col_b BETWEEN 33 AND 33.1) > the estimate suddenly jumps to 276667. If you look closely at what EXPLAIN is saying, that expression expands as Filter: ((col_b >= 33) AND ((col_b)::numeric <= 33.1)) So the things being compared to aren't the same, and it doesn't recognize this as a range constraint, and you get the dumb product-of-independent-inequality-probabilities estimate. The "exact" estimate you got in the first case is more of a lucky guess than anything else, too. It does realize that it's got a range constraint in that case, but it has no stats about the value of the expression col_b::numeric. So you're just getting a default estimate that by coincidence matches the correct answer. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general