Re: [HACKERS] Slow count(*) again...

Greg Smith <greg@xxxxxxxxxxxxxxx> · Thu, 03 Feb 2011 17:35:12 -0500

david@xxxxxxx wrote:
I am making the assumption that an Analyze run only has to go over the 
data once (a seqential scan of the table if it's >> ram for example) 
and gathers stats as it goes.

And that's the part there's some confusion about here.  ANALYZE grabs a 
random set of samples from the table, the number of which is guided by 
the setting for default_statistics_target.  The amount of time it takes 
is not proportional to the table size; it's only proportional to the 
sampling size.  Adding a process whose overhead is proportional to the 
table size, such as the continuous analyze idea you're proposing, is 
quite likely to be a big step backwards relative to just running a 
single ANALYZE after the loading is finished.

What people should be doing if concerned about multiple passes happening 
is something like this:

CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);

I'm not optimistic the database will ever get smart enough to recognize 
bulk loading and do this sort of thing automatically, but as the 
workaround is so simple it's hard to get motivated to work on trying.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance