Re: [HACKERS] Autovacuum Improvements

Bruce Momjian <bruce@xxxxxxxxxx> · Fri, 19 Jan 2007 16:36:50 -0500 (EST)

Added to TODO:

>       o Allow multiple vacuums so large tables do not starve small
>         tables
>
>         http://archives.postgresql.org/pgsql-general/2007-01/msg00031.php
>
>       o Improve control of auto-vacuum
>
>         http://archives.postgresql.org/pgsql-hackers/2006-12/msg00876.php

---------------------------------------------------------------------------

Darcy Buskermolen wrote:
> On Friday 19 January 2007 01:47, Simon Riggs wrote:
> > On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > > elein wrote:
> > > > > Have you made any consideration of providing feedback on autovacuum
> > > > > to users? Right now we don't even know what tables were vacuumed when
> > > > > and what was reaped.  This might actually be another topic.
> > > >
> > > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > > to have a log table, on which we'd register what did we run, at what
> > > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > > having it on the regular text log is useful but it's not good enough.
> > > > Keep in mind that in the future we may want to peek at that collected
> > > > information to be able to take better scheduling decisions (or at least
> > > > inform the DBA that he sucks).
> > > >
> > > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > > that manually-run vacuums would be logged as well.
> > >
> > > Yes I did intend this thought for vacuum, not strictly autovacuum.
> >
> > I agree, for all VACUUMs: we need a log table.
> >
> > The only way we can get a feedback loop on what has come before is by
> > remembering what happened. Simply logging it is interesting, but not
> > enough.
> 
> Correct, I think we are all saying the same thing that is this log table is 
> purely inserts so that we can see trends over time.
> 
> >
> > There is some complexity there, because with many applications a small
> > table gets VACUUMed every few minutes, so the log table would become a
> > frequently updated table itself. I'd also suggest that we might want to
> > take account of the number of tuples removed by btree pre-split VACUUMs
> > also.
> 
> Thinking on this a bit more, I suppose that this table really should allow for 
> user defined triggers on it, so that a DBA can create partioning for it, not 
> to mention being able to move it off into it's own tablespace. 
> 
> 
> >
> > I also like the idea of a single scheduler and multiple child workers.
> >
> > The basic architecture is clear and obviously beneficial. What worries
> > me is how the scheduler will work; there seems to be as many ideas as we
> > have hackers. I'm wondering if we should provide the facility of a
> > pluggable scheduler? That way you'd be able to fine tune the schedule to
> > both the application and to the business requirements. That would allow
> > integration with external workflow engines and job schedulers, for when
> > VACUUMs need to not-conflict with external events.
> >
> > If no scheduler has been defined, just use a fairly simple default.
> >
> > The three main questions are
> > - what is the maximum size of VACUUM that can start *now*
> 
> How can we determine this given we have no real knowledge of the upcoming  
> adverse IO conditions ?
> 
> > - can *this* VACUUM start now?
> > - which is the next VACUUM to run?
> >
> > If we have an API that allows those 3 questions to be asked, then a
> > scheduler plug-in could supply the answers. That way any complex
> > application rules (table A is available for VACUUM now for next 60 mins,
> > table B is in constant use so we must use vacuum_delay), external events
> > (long running reports have now finished, OK to VACUUM), time-based rules
> > (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> > first 3 days of the each month is financial accounting close) can be
> > specified.
> 
> Another thought, is it at all possible to do a partial vacuum?  ie spend the 
> next 30 minutes vacuuming foo table, and update the fsm with what hew have 
> learned over the 30 mins, even if we have not done a full table scan ?
> 
> 
> -- 
> 
> 
> Darcy Buskermolen
> The PostgreSQL company, Command Prompt Inc.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

-- 
  Bruce Momjian   bruce@xxxxxxxxxx
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +