In response to Michiel Holtkamp <michiel.holtkamp@xxxxxxxxxxxxxx>: > Hello list, > > Just joined, read the archives, but couldn't find a solution to my > problem. My question is: 'How can I best determine when to delete > data?'. Hopefully this is the right place to ask, otherwise kindly > redirect me to the proper list. > > The situation is as follows: we use PostgreSQL 8.1 to store large > amounts of data (we are talking GB's). This data is stored as large > objects and when we delete data, we don't forget to delete the > corresponding large objects as well. The data stored is deleted after a > while (usually a couple of weeks), so far so good. > > Due to the nature of the information (sound data, recording triggered on > certain technical details) the amount of information is not very > predictable. Sometimes a lot of data is stored over a period of a few > days and the disk runs out of free space (this is not theoretical, in > one case it happened already). For this situation we decided that we > don't mind deleting some data earlier than normal, to ensure that we can > store newly generated data (newer data is more important than older data). > > The problem: > Somehow we have to decide when to delete data earlier than normal. We > can't do this by checking disk-space, because postgres reserves > disk-space. Freeing disk-space can be done by doing a full vacuum, but > this locks tables and could cause data to be lost, besides I don't mind > that postgres reserves tables, it's more efficient anyway. > > If anyone has ideas about this problem, it would be greatly appreciated, > I'm sure this is a problem encountered by more persons. I've already > looked at certain system tables (specifically pg_stat_user_tables) and > at docs like: http://www.postgresql.org/docs/8.1/static/diskusage.html > but so for no satisfying solution emerged. Since you don't give any idea how much data is involved, let me iterate through your choices, given the unknowns: 1) Buy more disk space. It's cheap. Get an external SCSI unit and a SCSI PCI card. 2) Work on your vacuum schedule. You don't need to vacuum full all the time, and regular vacuum doesn't lock tables. If you do regular vacuum often enough, you won't see significant bloat. 3) Reduce the data size. You say this is audio data, can you reduce the bitrate or other storage factors so the data streams aren't so huge? In general, if you're hitting the limits of your physical storage on a regular basis, you either underestimated your storage requirements (which means you should get more storage) or you're storing too much. The _business_need_ needs to dictate how often you purge old data. If you don't have enough hardware to meet the business need, you need to add hardware. If the business need is to store X gigabytes with no regard for how old the data is, then you need to adjust your data storage methods to work with that. Create a table to store the size of each LO, and run a regular maintenance job that purges old data when the used size gets too big. -- Bill Moran http://www.potentialtech.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match