On Sat, 2006-12-23 at 13:13 -0500, Bruce Momjian wrote: > The bottom line is that we know of now cases where a long-running > transaction would delay recycling of the WAL files, so there is > certainly something not understood here. We can see from all of this that a checkpoint definitely didn't occur. Tom's causal chain was just one way that could have happened, there could well be others. I've noticed previously that a checkpoint can be starved out when trying to acquire the CheckpointStartLock. I've witnessed a two minute delay plus in obtaining the lock in the face of heavy transactions. If wal_buffers is small enough, WAL write rate high enough and the transaction rate high enough, a long queue can form for the WALWriteLock, which ensures that the CheckpointStartLock would queue indefinitely. I've tried implementing a queueable shared lock for the CheckpointStartLock. That helps the checkpoint, but it harms performance of other transactions waiting to commit, so I let that idea go. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com