On Friday 19 January 2007 01:47, Simon Riggs wrote: > On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote: > > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote: > > > elein wrote: > > > > Have you made any consideration of providing feedback on autovacuum > > > > to users? Right now we don't even know what tables were vacuumed when > > > > and what was reaped. This might actually be another topic. > > > > > > I'd like to hear other people's opinions on Darcy Buskermolen proposal > > > to have a log table, on which we'd register what did we run, at what > > > time, how long did it last, how many tuples did it clean, etc. I feel > > > having it on the regular text log is useful but it's not good enough. > > > Keep in mind that in the future we may want to peek at that collected > > > information to be able to take better scheduling decisions (or at least > > > inform the DBA that he sucks). > > > > > > Now, I'd like this to be a VACUUM thing, not autovacuum. That means > > > that manually-run vacuums would be logged as well. > > > > Yes I did intend this thought for vacuum, not strictly autovacuum. > > I agree, for all VACUUMs: we need a log table. > > The only way we can get a feedback loop on what has come before is by > remembering what happened. Simply logging it is interesting, but not > enough. Correct, I think we are all saying the same thing that is this log table is purely inserts so that we can see trends over time. > > There is some complexity there, because with many applications a small > table gets VACUUMed every few minutes, so the log table would become a > frequently updated table itself. I'd also suggest that we might want to > take account of the number of tuples removed by btree pre-split VACUUMs > also. Thinking on this a bit more, I suppose that this table really should allow for user defined triggers on it, so that a DBA can create partioning for it, not to mention being able to move it off into it's own tablespace. > > I also like the idea of a single scheduler and multiple child workers. > > The basic architecture is clear and obviously beneficial. What worries > me is how the scheduler will work; there seems to be as many ideas as we > have hackers. I'm wondering if we should provide the facility of a > pluggable scheduler? That way you'd be able to fine tune the schedule to > both the application and to the business requirements. That would allow > integration with external workflow engines and job schedulers, for when > VACUUMs need to not-conflict with external events. > > If no scheduler has been defined, just use a fairly simple default. > > The three main questions are > - what is the maximum size of VACUUM that can start *now* How can we determine this given we have no real knowledge of the upcoming adverse IO conditions ? > - can *this* VACUUM start now? > - which is the next VACUUM to run? > > If we have an API that allows those 3 questions to be asked, then a > scheduler plug-in could supply the answers. That way any complex > application rules (table A is available for VACUUM now for next 60 mins, > table B is in constant use so we must use vacuum_delay), external events > (long running reports have now finished, OK to VACUUM), time-based rules > (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime, > first 3 days of the each month is financial accounting close) can be > specified. Another thought, is it at all possible to do a partial vacuum? ie spend the next 30 minutes vacuuming foo table, and update the fsm with what hew have learned over the 30 mins, even if we have not done a full table scan ? -- Darcy Buskermolen The PostgreSQL company, Command Prompt Inc.