On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wr
regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN
:values_clause;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -----------
Aggregate (cost=245006.16..245006.17 rows=1 width=8) (actual time=3550.581..3550.581 rows=1 loops=1)
Execution time: 3550.700 ms
regression=# set enable_hashagg TO 0;
regression=# set enable_sort TO 0;
SET
regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN
:values_clause;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -------
Aggregate (cost=320003.90..320003.91 rows=1 width=8) (actual time=3548.364..3548.364 rows=1 loops=1)
Execution time: 3548.463 ms
At least in this example, the actual runtimes are basically identical
regardless, so there is no great point in sweating over it.
Since The run times are equal, but one is estimated to be 30% more expensive, I think there is at least some little reason to sweat over it.
Incidentally, I accidentally ran this against a server running with your patch from https://www.postgresql.org/message-id/10078.1471955305@xxxxxxxxxxxxx. On that server, it did choose the semi-join. But I have no idea why, as it seems like the effect of that patch would have been to change the distinct estimate from the magic hard-coded 200, to the natural 200 coming from the query itself. Why would that affect the cost?
Cheers,
Jeff