Re: [HACKERS] Slow count(*) again...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 03, 2011 at 04:39:12PM -0800, david@xxxxxxx wrote:
> On Thu, 3 Feb 2011, Robert Haas wrote:
>
>> On Thu, Feb 3, 2011 at 3:54 PM,  <david@xxxxxxx> wrote:
>>> with the current code, this is a completely separate process that knows
>>> nothing about the load, so if you kick it off when you start the load, it
>>> makes a pass over the table (competing for I/O), finishes, you continue 
>>> to
>>> update the table, so it makes another pass, etc. As you say, this is a 
>>> bad
>>> thing to do. I am saying to have an option that ties the two togeather,
>>> essentially making the data feed into the Analyze run be a fork of the 
>>> data
>>> comeing out of the insert run going to disk. So the Analyze run doesn't 
>>> do
>>> any I/O and isn't going to complete until the insert is complete. At 
>>> which
>>> time it will have seen one copy of the entire table.
>>
>> Yeah, but you'll be passing the entire table through this separate
>> process that may only need to see 1% of it or less on a large table.
>> If you want to write the code and prove it's better than what we have
>> now, or some other approach that someone else may implement in the
>> meantime, hey, this is an open source project, and I like improvements
>> as much as the next guy.  But my prediction for what it's worth is
>> that the results will suck.  :-)
>
> I will point out that 1% of a very large table can still be a lot of disk 
> I/O that is avoided (especially if it's random I/O that's avoided)
>
> David Lang
>

In addition, the streaming ANALYZE can provide better statistics at
any time during the load and it will be complete immediately. As far
as passing the entire table through the ANALYZE process, a simple
counter can be used to only send the required samples based on the
statistics target. Where this would seem to help the most is in
temporary tables which currently do not work with autovacuum but it
would streamline their use for more complicated queries that need
an analyze to perform well.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux