Re: Why there is 30000 rows is sample

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Sat, 4 Apr 2020 16:05:57 +0200

On 2020-04-04 10:07:51 +0300, Andrus wrote:
> vacuumdb: vacuuming database "mydb"
> INFO:  analyzing "public.mytable"
> INFO:  "mytable": scanned 2709 of 2709 pages, containing 10834 live rows and
> 0 dead rows; 10834 rows in sample, 10834 estimated total rows
> 
> For tables with more than 30000 rows, it shows that there are 30000 rows in sample.
> 
> postgresql.conf does not set  default_statistics_target value.
> It contains
> 
> #default_statistics_target = 100    # range 1-10000
> 
> So I expect that there should be 100 rows is sample.

The statistics target determines the size of gathered statistics: A
value of 100 means that Postgres should store (at most) the 100 most
frequent values and that the histogram should have 100 buckets.

Obviously you can't determine the 100 most frequent values if you onöy
sample 100 rows, and a histogram with 100 buckets based on only 100 rows
will be quite empty.

So you need to sample more rows to compute those statistics. How much
more? As it turns out (see the link Julien posted), it depends a bit on
the size of the table, but not that much, so a factor of 300 is good
enough for a wide range of sizes.

> Why Postgres uses 30000 or number of rows in table for smaller tables ?
> 
> Is 30000 some magical value, how to control it.

The factor of 300 is a bit magical but well founded. You can't control
it (except by changing the source code and recompiling, of course - but
why would you want to?). The value 100 can be controlled either by
changing default_statistics_target or by changing the statistics target
of a specific column of a specific table (alter table ... alter column
... set statistics ...)

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@xxxxxx         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Attachment:
signature.asc

Description: PGP signature