On Sun, 2006-12-24 at 03:03, Christopher Browne wrote: [snip] > Seems to me that you could get ~80% of the way by having the simplest > "2 queue" implementation, where tables with size < some threshold get > thrown at the "little table" queue, and tables above that size go to > the "big table" queue. That would most definitely not cut it for me, I have more than 2 categories of tables: - a few small but very often updated/inserted/deleted table: these must be continuously vacuumed, your "little queue" is not good enough for that, as even the round trip between the small tables could lead to bloat on them; - a few small and moderately updated, that could live with the "little queue"; - a few big and frequently updated, but which only have a small percentage of rows actively updated at any time: those could live with the big queue; - the rest which are rarely updated, I would put those in a separate queue so they won't affect the rest, cause vacuuming them is really mostly not critical; The point is that I'm not sure there couldn't be even more reasons to split the tables in even more queues based on the importance of vacuuming them combined with update rate and their size. If I can set up my own queues I can experiment with what works best for me... for the base setup you could set up some default queues. I wonder though how would you handle dynamics of tables, I mean when will a small table which grows start to be considered a big table for the purpose of putting it in one queue or the other ? I guess it would be done on analyzing the table, which is also handled by autovacuum, so tables with no vacuum queue settings could go to one of the 2 default queues you mention. > That should keep any small tables from getting "vacuum-starved." > > I'd think the next step would be to increase the number of queues, > perhaps in a time-based fashion. There might be times when it's > acceptable to vacuum 5 tables at once, so you burn thru little tables > "like the blazes," and handle larger ones fairly promptly. And other > times when you don't want to do *any* big tables, and limit a single > queue to just the itty bitty ones. This is all nice and it would be cool if you could set it up per vacuum queue. I mean how much more effort would be to allow vacuum queues with generic settings like time windows with max number of threads for each window, and let the user explicitly assign tables to those queues, instead of hard coding the queues and their settings and assign tables to them based on size or any other heuristics ? For the average application which needs simple settings, there could be a default setup with the 2 queues you mention. If it would be possible to set up some rules to assign tables to queues based on their properties on analyze time, instead of explicitly assigning to one queue or other, that would be nice too, and then you can completely cover the default setup with those settings, and allow for more complex setups for those who need it. > This approach allows you to stay mostly heuristic-based, as opposed to > having to describe policies in gratuitous detail. I agree that for simple setups that would be OK, but like I said, if it would be easy enough to code that heuristics, and provide some sane setup as default, and then let the user optimize it, that would be a cool solution. Now it's true I don't really know how would you code 'assign all tables which are smaller than x rows to vacuum queue "little-tables"' ... maybe by providing a function to the queue which "matches" on the table ? And you can change that function ? No idea, but it probably can be done... > Having a mechanism that requires enormous DBA effort and where there > is considerable risk of simple configuration errors that will be hard > to notice may not be the best kind of "feature" :-). I think most people will not want to touch the default settings unless it will not work good enough for them. I definitely not like too much that I had to set up some cron jobs beside autovacuum, as they are most definitely not doing optimal job, but autovacuum was not doing that either, and I'm afraid a 2-queue system would also not do it at least for the queue-like tables I have, which must be vacuumed continuously, but only if they need it... that's what I expect from autovacuum, to vacuum all tables in the proper periodicity/time window for each of them, but only if they need it... and I can imagine way more such periodicity/time window settings than 2. Now if autovacuum could figure out on itself all those settings, that would be even cooler, but if I can set it up myself that would be good enough. Actually I think all vacuum patterns could be automatically figured out by looking at the statistics AND the dynamics of those statistics (i.e. it changes in bursts, or steadily increasing over time, etc.), and possibly also the read access statistics (there's no big reward in too frequently vacuuming a table which is only inserted and deleted and rarely read), and perhaps some hints from the user about speed requirements for specific tables. The problem with all this is that I doubt there is enough experience to write such a heuristics to optimally cover all situations, and even if there were, it could result in really complex code, so that's why I think it is more reasonable to let people set up their vacuum queues... Another point: it would be nice if autovacuum could also decide to do a full vacuum, or even better a CLUSTER under certain circumstances for tables which are badly bloated (you could argue that should never happen if autovacuum is set up properly, but think about a queue-like table heavily updated during a backup is running). Of course this can also backfire if set up by default, so I guess this would have to be set up explicitly... possibly with rules like table size, max bloat allowed, time window, etc. One thing to avoid badly locking the application is to acquire an exclusive lock with nowait and only do the full vacuum if the lock succeeds (to avoid situations like: a backup is running, and it will for the next 2 hours, we ask for an exclusive lock, will stay on hold, but in the same time we lock all new read requests for the next 2 hours till the backup is done... while the operation we wanted to do is guarantied to be finished in 10 seconds, as the table is heavily bloated but still small). Another thing autovacuum could figure out is not to do a vacuum at all if there is a long running transaction running and disabling anyway the work vacuum would do (although I'm not sure it does not do this one already, does it ?). Well, maybe not all what I rambled along makes sense, but I dumped my brain now anyway... hope I didn't bore you too much :-) Cheers, Csaba.