On Thu, Feb 03, 2011 at 04:39:12PM -0800, david@xxxxxxx wrote: > On Thu, 3 Feb 2011, Robert Haas wrote: > >> On Thu, Feb 3, 2011 at 3:54 PM, <david@xxxxxxx> wrote: >>> with the current code, this is a completely separate process that knows >>> nothing about the load, so if you kick it off when you start the load, it >>> makes a pass over the table (competing for I/O), finishes, you continue >>> to >>> update the table, so it makes another pass, etc. As you say, this is a >>> bad >>> thing to do. I am saying to have an option that ties the two togeather, >>> essentially making the data feed into the Analyze run be a fork of the >>> data >>> comeing out of the insert run going to disk. So the Analyze run doesn't >>> do >>> any I/O and isn't going to complete until the insert is complete. At >>> which >>> time it will have seen one copy of the entire table. >> >> Yeah, but you'll be passing the entire table through this separate >> process that may only need to see 1% of it or less on a large table. >> If you want to write the code and prove it's better than what we have >> now, or some other approach that someone else may implement in the >> meantime, hey, this is an open source project, and I like improvements >> as much as the next guy. But my prediction for what it's worth is >> that the results will suck. :-) > > I will point out that 1% of a very large table can still be a lot of disk > I/O that is avoided (especially if it's random I/O that's avoided) > > David Lang > In addition, the streaming ANALYZE can provide better statistics at any time during the load and it will be complete immediately. As far as passing the entire table through the ANALYZE process, a simple counter can be used to only send the required samples based on the statistics target. Where this would seem to help the most is in temporary tables which currently do not work with autovacuum but it would streamline their use for more complicated queries that need an analyze to perform well. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance