In response to Guillaume Cottenceau <gc@xxxxxx>: > I'm trying to come up with a way to estimate the need for a > VACUUM FULL and/or a REINDEX on some tables. You shouldn't vacuum full unless you have a good reason. Vacuum full causes index bloat. > According to documentation[1], VACUUM FULL's only benefit is > returning unused disk space to the operating system; am I correct > in assuming there's also the benefit of optimizing the > performance of scans, because rows are physically compacted on > the disk? In my experience, the smaller the overall database size, the less shared memory it requires. Keeping it vacuumed will reduce the amount of space taken up in memory, which means it's more likely that the data you need at any particular time is in memory. Look up a thread with my name on it a lot related to reindexing. I did some experiments with indexes and reindexing and the only advantage I found was that the space requirement for the indexes is reduced by reindexing. I was not able to find any performance difference in newly created indexes vs. indexes that were starting to bloat. > With that in mind, I've tried to estimate how much benefit would > be brought by running VACUUM FULL, with the output of VACUUM > VERBOSE. However, it seems that for example the "removable rows" > reported by each VACUUM VERBOSE run is actually reused by VACUUM, > so is not what I'm looking for. I'm not sure what you mean by that last sentence. There are only two circumstances (I can think of) for running vacuum full: 1) You've just made some major change to the database (such as adding an obscene # of records, making massive changes to a large percentage of the existing data, or issuing a lot of "alter table") and want to get the FSM back down to a manageable size. 2) You are desperately hurting for disk space, and need a holdover until you can get bigger drives. Reindexing pretty much falls into the same 2 scenerios. I do recommend that you reindex after any vacuum full. However, a much better approach is to either schedule frequent vacuums (without the full) or configure/enable autovacuum appropriately for your setup. > Then according to documentation[2], REINDEX has some benefit when > all but a few index keys on a page have been deleted, because the > page remains allocated (thus, I assume it improves index scan > performance, am I correct?). However, again I'm unable to > estimate the expected benefit. With a slightly modified version > of a query found in documentation[3] to see the pages used by a > relation[4], I'm able to see that the index data from a given > table... > > relname | relpages | reltuples > ------------------------+----------+----------- > idx_sessions_owner_key | 38 | 2166 > pk_sessions | 25 | 2166 > > ...is duly optimized after a REINDEX: > > relname | relpages | reltuples > ------------------------+----------+----------- > idx_sessions_owner_key | 13 | 2166 > pk_sessions | 7 | 2166 > > but what I'd need is really these 38-13 and 25-7 figures (or > estimates) prior to running REINDEX. Again, my experience shows that reindexing is only worthwhile if you're really hurting for disk space/memory. I don't know of any way to tell what size an index would be if it were completely packed, but it doesn't seem as if this is the best approach anyway. Newer versions of PG have the option to create indexes with empty space already there at creation time (I believe this is called "fill factor") to allow for future growth. The only other reason I can see for vacuum full/reindex is if you _can_. For example, if there is a period that you know the database will be unused that it sufficiently long that you know these operations can complete. Keep in mind that both reindex and vacuum full create performance problems while they are running. If you knew, however, that the system was _never_ being used between 6:00 PM and 8:00 AM, you could run them over night. In that case, I would recommend replacing vacuum full with cluster. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@xxxxxxxxxxxxxxxxxxxxxxx Phone: 412-422-3463x4023