Re: [HACKERS] Autovacuum Improvements

Darcy Buskermolen <darcyb@xxxxxxxxxxxxxxxxx> · Fri, 19 Jan 2007 06:48:26 -0800

On Friday 19 January 2007 01:47, Simon Riggs wrote:
> On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > elein wrote:
> > > > Have you made any consideration of providing feedback on autovacuum
> > > > to users? Right now we don't even know what tables were vacuumed when
> > > > and what was reaped.  This might actually be another topic.
> > >
> > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > to have a log table, on which we'd register what did we run, at what
> > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > having it on the regular text log is useful but it's not good enough.
> > > Keep in mind that in the future we may want to peek at that collected
> > > information to be able to take better scheduling decisions (or at least
> > > inform the DBA that he sucks).
> > >
> > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > that manually-run vacuums would be logged as well.
> >
> > Yes I did intend this thought for vacuum, not strictly autovacuum.
>
> I agree, for all VACUUMs: we need a log table.
>
> The only way we can get a feedback loop on what has come before is by
> remembering what happened. Simply logging it is interesting, but not
> enough.

Correct, I think we are all saying the same thing that is this log table is 
purely inserts so that we can see trends over time.

>
> There is some complexity there, because with many applications a small
> table gets VACUUMed every few minutes, so the log table would become a
> frequently updated table itself. I'd also suggest that we might want to
> take account of the number of tuples removed by btree pre-split VACUUMs
> also.

Thinking on this a bit more, I suppose that this table really should allow for 
user defined triggers on it, so that a DBA can create partioning for it, not 
to mention being able to move it off into it's own tablespace. 

>
> I also like the idea of a single scheduler and multiple child workers.
>
> The basic architecture is clear and obviously beneficial. What worries
> me is how the scheduler will work; there seems to be as many ideas as we
> have hackers. I'm wondering if we should provide the facility of a
> pluggable scheduler? That way you'd be able to fine tune the schedule to
> both the application and to the business requirements. That would allow
> integration with external workflow engines and job schedulers, for when
> VACUUMs need to not-conflict with external events.
>
> If no scheduler has been defined, just use a fairly simple default.
>
> The three main questions are
> - what is the maximum size of VACUUM that can start *now*

How can we determine this given we have no real knowledge of the upcoming  
adverse IO conditions ?

> - can *this* VACUUM start now?
> - which is the next VACUUM to run?
>
> If we have an API that allows those 3 questions to be asked, then a
> scheduler plug-in could supply the answers. That way any complex
> application rules (table A is available for VACUUM now for next 60 mins,
> table B is in constant use so we must use vacuum_delay), external events
> (long running reports have now finished, OK to VACUUM), time-based rules
> (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> first 3 days of the each month is financial accounting close) can be
> specified.

Another thought, is it at all possible to do a partial vacuum?  ie spend the 
next 30 minutes vacuuming foo table, and update the fsm with what hew have 
learned over the 30 mins, even if we have not done a full table scan ?

-- 

Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.