>-----Original Message----- >From: Alvaro Herrera [mailto:alvherre@xxxxxxxxxxxxxxxxx] >Sent: dinsdag 9 januari 2007 22:18 >To: Joris Dobbelsteen >Cc: Chris Browne; pgsql-general@xxxxxxxxxxxxxx >Subject: Re: [GENERAL] Autovacuum Improvements > >Joris Dobbelsteen wrote: > >> Now we have at least one different model, lets mix in other captures >> and situations. So it cannot be done with only YOUR data, I >fully agree. >> But if you have sufficient data you can find the >generalization of the >> model to make it work (resonable) in sufficient situations. >> Of course models need time to evolve, but so does the implementation >> currently at a slow rate. From do it yourself, to scripts, to the >> current autovacuum integration (which is good). From doing >all tables >> sequentially to having some intelligence by update >thresholds, to what >> will be next. >> >> I think you should better solve the problem is this ways, as models >> are relative easy to compare compared to arguments without >> analyzable/simulatible data. > >To be frank, I'm not sure I understand what you're saying >here. I'm sure more analysis is good; that's easy to agree with. I hope to make it more clear. With a model I mean something which I can use to: * estimates the performance effects of 'defragmentation', excessive pages or a low page fill factor. * estimates how much vacuum affects the performance while it is running. * as a result, estimate how the performance was improved as a result of vacuuming. With this its possible to guess how much vacuum costs and if you will benefit from vacuuming. And hopefully it gives more details on how performance is affected. Examples (under DISCLAIMER, its my low-end machine): * How does fragmentation affect inserts. I've observed that they take constant time, regardless of the table size, transaction size or number of columns. On my low-end system it was limited by the CPU. So here it does not require vacuuming. Probably some test elsewhere will give different results. * How does it affect index scans? (I expect it mainly related to index size and cache limits) * How does it affect table scans (I expect it goes linear with the table size). * How does it affect updates? * How does it affect deletes? * (What did I forget?) * Vacuum seems to scale linear with the table size. Futher its extremely I/O intensive and not much else. Do they depend on table size (like vacuum/table scan), on index size (index scans?) or do they scale O(1) (inserts?)... I've not yet found anything concrete, except several comments that it will depend on your situation. However, the information should be available or is mainly estimated by experience of DBAs. >However, I don't want to be trapped in a design that's too >hard to implement, or too hard for DBAs to manage. Exactly... If you can model it, you can implement it. But you can also analyze it ahead of time: * You can see what you need to configure, as its in the model. Perhaps much can be done to automatically tune * Furthermore you can predict its impact (expected performance) well ahead of implementation and deployment. * You can test your ideas on a model and see if it would work out, of course leaving you to see how accurate they are in practical situations... That is why I proposed it. >There have been proposals to add these knobs: >- maximum number of simultaneous processes (and make it more than 1) Indeed, vacuum takes to long on some tables. My finding is that the vacuum time scales linearly with the number of pages. Perhaps vacuum can be more efficient or do partial cleanups instead of full page scans? >- times of day on which to change parameters (i.e. disable vacuum > altogether, or make it more agressive or less) - Something DBAs call "HOT" tables. - There is something about maintenance windows, or would you rather let the administrator specify when the database has it business hours. There has been discussion about heuristics and metrics required for that. In some part of the thread its called "dirtyness" which has a quite ambigious meaning. - Joris