On Tue, May 05, 2009 at 07:08:41PM -0400, Zing Zing Shishak wrote: > Bron Gondwana wrote: > > On Tue, May 05, 2009 at 01:18:46PM -0400, Zing wrote: > >> I'm also seeing a segfault (i've seen bus error also) in unexpunge -l when > >> I set an expire annotation on a mailbox and run cyr_expire. I'm running > >> cyrus 2.3.14 + the ipurge patch from Bron on f10 (x86_64), but that > >> doesn't help (i didn't think it would): > > > > You want the "disable EXPUNGE_FORCE" patch I committed to CVS yesterday :) > > oh, perfect timing. :) That seems to do the trick as a workaround. thanks. Cool :) It seems the most sensible approach - NEVER delete the files on disk completely unless doing a cyr_expire run. > > (I've also got patches that turn that crash into a syslogged error instead, > > but they don't actually solve the corruption) > > good to know. i can test out any patches if people want to try to solve > the corruption... The really interesting ones aren't production tested yet - a complete rewrite of all cache accesses to go through the one codepath is in production at FastMail, but the delayed cache loading isn't. Delayed cache loading is nice, because if you select a mailbox and never make a query that actually _needs_ the cache, it doesn't get opened or statted or anything. Reduces IO. So anyway - I should get back to work on that soon. First I need to figure out what missing sync_log commands are needed to make CONDSTORE replication reliable. I've just enabled CONDSTORE for a sacrificial few thousand users to see what happens :) Including me of course! The new Thunderbird beta supports using it, so I want it on! > As I was searching the dev archives, a post by James E. Blair last year > seemed to have an analysis: > > http://lists.andrew.cmu.edu/pipermail/cyrus-devel/2008-September/000935.html Yes, now that is interesting. I think I skimmed over it at the time, but it raises some good points. The 200Gb virtual file size - sounds like the "exists" field in the cyrus.expunge file got some totally bogus value. Index files get written at an offset calculated by exists rather than by actual file size so that a failed append doesn't break anything. I haven't done any work at solving that issue. The whole expunge codepath, despite having been cleaned up a couple of times over the years, could still do with some more TLC! Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html