In response to Mark Drago <markdrago@xxxxxxxx>: > > I'm using PostgreSQL to log web traffic leaving our network. This > results in the database growing to a fairly large size. This machine > will be left unattended and basically unmaintained for long stretches of > time so I need a way to limit the disk space that Postgres uses. So far > I have come up with two basic ideas: > > 1. Routinely run 'du' in the directory containing the PostgreSQL data, > which in my case is /var/postgresql/data and when it gets to a certain > size remove a whole bunch of the old data from the database, and run > 'vacuum full; reindex database db_name; analyze;'. > > The problem with this is that the vacuum could take nearly an hour to > run in some cases and there will be data that needs to get logged during > this hour. Also, the vacuum process could use disk space above what the > database is currently using and that disk space may not be available. > > 2. Use pgstattuple() to determine how much space is being used at any > given time and delete a bunch of old rows from the database when it is > approaching a limit. > > The nice thing about this is that 'vacuum full;' does not have to be > executed in order to see the space get reclaimed. The downside is that > running pgstattuple() is much more expensive than running 'du', so the > disk space checks can't happen as often, and they can not be run at all > during the day. > > I am curious to know if anyone has any other ideas as to how I can limit > the disk space that PostgreSQL uses to say 5GB. I have not looked in to > pg_autovacuum yet, but from what I have read about it it does not seem > to be the answer to this problem. Has anyone else had to do such a > thing before? Does anyone have any ideas on how to do this better? I don't think either of those are good ideas, because they both rely on disk limits to trigger drastic changes in database size, which will then require drastic maintenance operations (vacuum full, reindex) to clean up. Personally, I think you'd be a lot better off estimating how much new data comes in each day, and scheduling a daily delete of old data combined with regular vacuum (either via cron or using autovacuum) and an occasional reindex. Add to that some system monitoring via snmp traps -- maybe even graphing with mrtg -- to keep an eye on things in case you need to adjust the frequency or amount of stuff that's done, and you should see the database stabilize at a manageable size. -- Bill Moran Collaborative Fusion Inc.