On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote: ! Peter <pmc@xxxxxxxxxxxxxxxxxxxxxxx> writes: ! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be ! > 3000000. Or at least some thousands. ! ! FWIW, this behaves better in v14 and up. In older versions there's ! an ambiguity about what relpages=reltuples=0 means; it's the state ! when ANALYZE has never yet examined the table contents, but it's ! also the state when we have found the table to be empty. Because ! of the former case, older planner versions don't like to believe ! that a table is empty even if reltuples says so. ! ! regards, tom lane Okay, I got someway through with it. Given, the destination table is empty, and VACUUMed. Then there is relpages = reltuples = 0. We do some "source LEFT OUTER JOIN destination ON (matchfield) WHERE matchfield IS NULL" The source is big, but carries only a few distinct matchfield values. The so-called "hack" in heapam_handler.c:heapam_estimate_rel_size() does then makes the planner believe that there are actually 10 curpages. This is not wrong, because the same might happen if one does an INSERT amounting for 10 pages and not yet ANALYZE. We then get a calculated rowcount of whatever, lets assume 2330 rows. Then we go into eqjoinsel(): n_distinct for the left source table is known, it is (lets assume) 4. n_distinct for the right destination table is not known because it is not analyzed, but it is UNIQUE, so it becomes 2330. We get into eqjoinsel_inner(): MCV (right) for the destination is not known because it is not analyzed. selec = selec / nd2 => 0.000429 We get into eqjoinsel_semi(): if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; Voila, selec = 1.0 And so the ANTI JOIN will estimate to 0 (aka 1) result rows, instead of whatever rowcount the source brings along (usually big). ----------------- cut ----------------------- Okay, so this should be fixed in R.14. But, as mentioned above, the same can also happen in normal operations Example: I have a logfile from my webserver and a table to collect the (unique) IP-addresses. I start with an empty table and feed in the log. First round, the destination addresses. Lats assume there are 2330 different addresses. Second round, the source addresses. There are only 4 different ones. I don't do an ANALYZE between the two steps. And voila, I get the same effect as above. (Been there, done it: duration: 30216273.583 ms) Agreed, this is an inefficient way of doing that. It would be better to do a UNION on the source data, at that point filter it accordingly, then DISTINCT, and only then the ANTI JOIN. But I'm a lazy guy and I may not care if a query that could run in 0.5 seconds takes 60 seconds instead. However, it's not so nice when it takes 10 hours. Anyway, I don't get a real clue on what this stance should actually achieve (from selfuncs.c:eqjoinsel_semi()#2773): if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); This seems to assume a 100% match whenever the left (source) table brings fewer distinct(!) values than the right (destination) table already carries. For now, I have just disabled this behaviour, in the rough&dirty way: --- src/backend/optimizer/path/costsize.c.orig 2022-11-07 22:47:13.000000000 +0100 +++ src/backend/optimizer/path/costsize.c 2023-02-13 00:04:54.156489000 +0100 @@ -4685,8 +4685,11 @@ /* pselec not used */ break; case JOIN_ANTI: + if(fkselec * jselec > 0.9975) { + jselec = 0.9975; + } nrows = outer_rows * (1.0 - fkselec * jselec); nrows *= pselec; break; default: /* other values not expected here */ cheerio, PMc