On Fri, 4 Jun 2021 at 22:59, Pól Ua Laoínecháin <linehanp@xxxxxx> wrote: > Now, we've gone from 1,160,000 to 39,830 rows (progress? :-) ) and a > cost of ~ 1M (compared with 168k for the first query). The estimates are not that meaningful due to a lack of table statistics on the "test" table. If you run ANALYZE on the table you might get something closer to the truth. If there are no stats on a table then the planner has a pretty hard job guessing how many tuples there are. All it does is count the number of pages currently in the table and look at the columns in the table and figure out how many tuples are likely to fit assuming each of those pages is full of tuples. If you just have a couple of tuples and they only take up a tiny fraction of the page then you're not going to get a very accurate number there. Generally, since there's so little to go on here, the code is purposefully designed to be more likely to overestimate the number of tuples than underestimate. Underestimations tend to produce worse plans than overestimations. It's also common for people to create tables then quickly load a bunch of records and start running queries. We want to do something sane there if that all happens before auto-analyze can get a chance to gather stats for the table. As for the call to generate_series, you're not likely to ever get any great estimation from that. The number of rows returned by a call to that particular function are just whatever is set in pg_proc.prorows, in this case, 1000. The other generate_series functions which take INT and BIGINT inputs do have a prosupport function. Generally, those will do a better job since those support functions look at the input arguments. However, that still might not go well since your inputs are columns in a table. David