In response to Joao Ferreira gmail <joao.miguel.c.ferreira@xxxxxxxxx>: > > When autovacuum was introduced, I kept the weekly VACUUM FULL because it > effectively brings disk occupation down, despite it grows back after a few > hours. It's just re-assuring to me to make sure that at least one of the > vacuums it's working when I see the weekly drop of disk occupation. Note the "it grows back after a few hours" is unneeded load on your system. You just spent a lot of IO shrinking a file that PG is now going to spend a lot of IO re-enlarging. Thus you made PG work harder for a while after the VACUUM FULL in order to get the table back to a stable size. > quoting: > > http://www.postgresql.org/docs/8.3/static/routine-reindex.html > > --------------------------------------------- > The potential for bloat in non-B-tree indexes has not been well > characterized. > ---------------------------------------------------- > > ------------------------------------------------- > It is a good idea to keep an eye on the index's physical size when using > any non-B-tree index type. > ---------------------------------------------------- Both special cases. B-tree's are default. > ---------------------------------------------------- > Also, for B-tree indexes a freshly-constructed index is somewhat faster > to access than one that has been updated many times, because logically > adjacent pages are usually also physically adjacent in a newly built > index. (This consideration does not currently apply to non-B-tree > indexes.) It might be worthwhile to reindex periodically just to improve > access speed. > ---------------------------------------------------------- > > I did some tests (lets say about 50 INSERT/UPDATES per second and somw > thousande DELETEs once in a while). > > Query execution time DROPs 10 times after REINDEXING and VACUUM This would be interesting to this conversation if you _only_ did the REINDEX. The addition of the VACUUM makes it ambiguous as to which command actually made the improvement. > I've seen 600Megas DROP in disk size ocupation just by reindexing; Is that consistent, or was it an extraordinary case? It's pretty amazing to imagine 600M of index bloat on a 100M database ... how many indexes do you have? Are you sure you don't have duplicate indexes or some other issue? > additionally about 500 Megas drop by VACCUMING; this was on a 100Megas > of usefull data. It's probably because you're only vacuuming once a week. > > That's a pretty lousy maintenance plan with lots of unneeded overhead, > > unless you're database has a very edge-case access pattern. > > !? Many people mention these 'edeg-case access pattern', and 'corner > cases'. I don't now if mine is such. My appliucation executes patterns > of varying INSERTING and UPDATING but it must overcome stress tests with > about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs > up to 10000 records once every night. > > It's a 'near-realtime' log store which stores log information for 'you > name it' logs in Linux based systems. It also removes old data once a > night. That is a fairly edge-case use that's actually documented in the docs you reference to be likely to cause abnormal amounts of bloat. However, it's just one index on one table (unless you have a bunch of indexes on that table that are all continually increasing?) Unless there's something I'm missing, it hardly justifies reindexing the entire database. > > VACUUM FULL really only needs done in extreme cases where massive data > > bloat is experienced, and not expected to happen again. If massive data > > bloat is routine, you're probably better off letting ordinary VACUUM > > maintain a consistent level of free space on tables. > > well. I've seen massive data and index bloating in my application. In > some cases REINDEXING and VACUUM FULL ing was the only way to clear up > things. It's possible that the stable size for you tables has more free space than actual data. While that seems crazy, it's really no different than having a file server with 4x as much disk space as people normally are using. > > REINDEXING seems nice at times, but I've yet to see any evidence that > > it's necessary. There are probably some corner cases, but I've not seen > > them documented. > > neither did I. REINDEXING helps me keep query execution time low. > Othewise it will increase to levels I cannot have. We use 'modest' > processors and 'modest' storage, in highly dependable embedded systems. Oh ... well this might change a lot. Depending on what you mean by "embedded", you may be using experience that's _very_ specialized to make generalizations from. The RAM/CPU/storage/etc constraints on anything I would call "embedded" are different than just about anyone else would be dealing with for tuning a DB server. If you're looking to tune PG to run well on an embedded platform, my advice would be to throw out everything you've heard and test extensively for yourself -- which sounds like what you've done. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@xxxxxxxxxxxxxxxxxxxxxxx Phone: 412-422-3463x4023