On 2020-04-04 10:07:51 +0300, Andrus wrote: > vacuumdb: vacuuming database "mydb" > INFO: analyzing "public.mytable" > INFO: "mytable": scanned 2709 of 2709 pages, containing 10834 live rows and > 0 dead rows; 10834 rows in sample, 10834 estimated total rows > > For tables with more than 30000 rows, it shows that there are 30000 rows in sample. > > postgresql.conf does not set default_statistics_target value. > It contains > > #default_statistics_target = 100 # range 1-10000 > > So I expect that there should be 100 rows is sample. The statistics target determines the size of gathered statistics: A value of 100 means that Postgres should store (at most) the 100 most frequent values and that the histogram should have 100 buckets. Obviously you can't determine the 100 most frequent values if you onöy sample 100 rows, and a histogram with 100 buckets based on only 100 rows will be quite empty. So you need to sample more rows to compute those statistics. How much more? As it turns out (see the link Julien posted), it depends a bit on the size of the table, but not that much, so a factor of 300 is good enough for a wide range of sizes. > Why Postgres uses 30000 or number of rows in table for smaller tables ? > > Is 30000 some magical value, how to control it. The factor of 300 is a bit magical but well founded. You can't control it (except by changing the source code and recompiling, of course - but why would you want to?). The value 100 can be controlled either by changing default_statistics_target or by changing the statistics target of a specific column of a specific table (alter table ... alter column ... set statistics ...) hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature