Re: EXPLAIN (ANALYZE, BUFFERS) - puzzling numbers for a simple query.

David Rowley <dgrowleyml@xxxxxxxxx> · Sat, 5 Jun 2021 00:12:28 +1200

On Fri, 4 Jun 2021 at 22:59, Pól Ua Laoínecháin <linehanp@xxxxxx> wrote:
> Now, we've gone from 1,160,000 to 39,830 rows (progress? :-) ) and a
> cost of ~ 1M (compared with 168k for the first query).

The estimates are not that meaningful due to a lack of table
statistics on the "test" table.  If you run ANALYZE on the table you
might get something closer to the truth.

If there are no stats on a table then the planner has a pretty hard
job guessing how many tuples there are.  All it does is count the
number of pages currently in the table and look at the columns in the
table and figure out how many tuples are likely to fit assuming each
of those pages is full of tuples.  If you just have a couple of tuples
and they only take up a tiny fraction of the page then you're not
going to get a very accurate number there.   Generally, since there's
so little to go on here, the code is purposefully designed to be more
likely to overestimate the number of tuples than underestimate.
Underestimations tend to produce worse plans than overestimations.
It's also common for people to create tables then quickly load a bunch
of records and start running queries. We want to do something sane
there if that all happens before auto-analyze can get a chance to
gather stats for the table.

As for the call to generate_series, you're not likely to ever get any
great estimation from that.  The number of rows returned by a call to
that particular function are just whatever is set in pg_proc.prorows,
in this case, 1000.  The other generate_series functions which take
INT and BIGINT inputs do have a prosupport function. Generally, those
will do a better job since those support functions look at the input
arguments.  However, that still might not go well since your inputs
are columns in a table.

David