On Thu, Feb 3, 2011 at 7:39 PM, <david@xxxxxxx> wrote: >> Yeah, but you'll be passing the entire table through this separate >> process that may only need to see 1% of it or less on a large table. >> If you want to write the code and prove it's better than what we have >> now, or some other approach that someone else may implement in the >> meantime, hey, this is an open source project, and I like improvements >> as much as the next guy. But my prediction for what it's worth is >> that the results will suck. :-) > > I will point out that 1% of a very large table can still be a lot of disk > I/O that is avoided (especially if it's random I/O that's avoided) Sure, but I think that trying to avoid it will be costly in other ways - you'll be streaming a huge volume of data through some auxiliary process, which will have to apply some algorithm that's very different from the one we use today. The reality is that I think there's little evidence that the way we do ANALYZE now is too expensive. It's typically very cheap and works very well. It's a bit annoying when it fires off in the middle of a giant data load, so we might need to change the time of it a little, but if there's a problem with the operation itself being too costly, this is the first I'm hearing of it. We've actually worked *really* hard to make it cheap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance