On Wed, 2006-05-17 at 00:36 -0400, Tom Lane wrote: > Jeff Frost <jeff@xxxxxxxxxxxxxxxxxxxxxx> writes: > > On Tue, 16 May 2006, Simon Riggs wrote: > >> Whatever happened between 02:08 and 02:14 seems important. > > > I have the logs and after reviewing /var/log/messages for that time period, > > there is no other activity besides postgres. > > I have a lurking feeling that the still-hypothetical connection between > archiver and foreground operations might come into operation at pg_clog > page boundaries (which require emitting XLOG events) --- that is, every > 32K transactions something special happens. The time delay between > archiver wedging and foreground wedging would then correspond to how > long it took the XID counter to reach the next 32K multiple. (Jeff, > what transaction rate do you see on that server --- is that a plausible > delay for some thousands of transactions to pass?) > > This is just a guess, but if you check the archives for Chris K-L's > out-of-disk-space server meltdown a year or three ago, you'll see > something similar. You'll have to explain a little more. I checked the archives... archiver looks for archive_status files that end with .ready and that has got nothing at all to do with transactions, LWlocks etc. If there's a file ready, it will archive it, if there's not - it won't. There is very deliberately a very low amount of synchronization there: archiver holds no locks, LWLocks or spinlocks at any time. The "lurking feeling" scenario above might or might nor be an issue here, but I can't see how the archiver could be involved at all. I see no evidence for the archiver to be the source of a problem here and that the only reason we're checking that is as a result of Jeff's original conjecture that there was a connection. There *was* a problem, yes, but I think we're looking in the wrong place for the murder weapon. pg_clog page extension does look like it can offer problems, generally. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com