david@xxxxxxx wrote:
I am making the assumption that an Analyze run only has to go over the
data once (a seqential scan of the table if it's >> ram for example)
and gathers stats as it goes.
And that's the part there's some confusion about here. ANALYZE grabs a
random set of samples from the table, the number of which is guided by
the setting for default_statistics_target. The amount of time it takes
is not proportional to the table size; it's only proportional to the
sampling size. Adding a process whose overhead is proportional to the
table size, such as the continuous analyze idea you're proposing, is
quite likely to be a big step backwards relative to just running a
single ANALYZE after the loading is finished.
What people should be doing if concerned about multiple passes happening
is something like this:
CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);
I'm not optimistic the database will ever get smart enough to recognize
bulk loading and do this sort of thing automatically, but as the
workaround is so simple it's hard to get motivated to work on trying.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance