Bron Gondwana wrote: > On Sun, Feb 08, 2009 at 08:18:44AM -0400, Patrick Boutilier wrote: >> Bron Gondwana wrote: >>> Also note: in the event of cache corruption, mailbox_cache_size_detail returns >>> zero bytes, which correctly makes append_copy re-parse the message file. It's >>> all shiny :) >>> >>> Wes/Ken - please apply to CVS :) >>> >>> Thanks, >>> >>> Bron. >> >> I was wondering if this bug would account for the behaviour that I am >> seeing. I am getting corrupt cyrus.expunge files that are causing >> unexepunge to segfault. >> >> Looking at the metadata I have noticed that I am getting NEW files for >> expunge,index, and cache on affected mail boxes. > > Unlikely. This bug caused extra junk (sometimes quite a lot of it!) to > be padded into the cyrus.cache file, but the offsets were still correct > in the index file, so it shouldn't have caused any user-visible > breakage. > >> [root@student2 boutilpj]# find -name \*.NEW\* >> ./RSS Feeds/KDE-Apps-org Content/cyrus.expunge.NEW >> ./RSS Feeds/KDE-Apps-org Content/cyrus.index.NEW >> ./RSS Feeds/KDE-Apps-org Content/cyrus.cache.NEW >> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.expunge.NEW >> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.index.NEW >> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.cache.NEW >> ./RSS Feeds/Linux Today/cyrus.expunge.NEW >> ./RSS Feeds/Linux Today/cyrus.index.NEW >> ./RSS Feeds/Linux Today/cyrus.cache.NEW >> ./RSS Feeds/Slashdot/cyrus.expunge.NEW >> ./RSS Feeds/Slashdot/cyrus.index.NEW >> ./RSS Feeds/Slashdot/cyrus.cache.NEW >> ./RSS Feeds/Freshmeat/cyrus.expunge.NEW >> ./RSS Feeds/Freshmeat/cyrus.index.NEW >> ./RSS Feeds/Freshmeat/cyrus.cache.NEW >> ./cyrus.expunge.NEW >> ./Mailing Lists/IMP/cyrus.expunge.NEW >> ./Mailing Lists/IMP/cyrus.index.NEW >> ./Mailing Lists/IMP/cyrus.cache.NEW >> ./cyrus.cache.NEW > > Yeah, so something is crashing. There are a huge variety of things > that could have caused problems (including kernels 2.6.22 -> 2.6.25.7 or > so on 64 bit processors, where there was an MMAP page fault bug. But you > should have seen a LOT more corruption in that case. Maybe you just > haven't noticed it yet!) Cyrus 2.3.13 compiled from source Kernel 2.6.27.7 , x86_64 CentOS 5.2 > > What are you seeing in your syslog? You do read your syslog, right? > (actually, I don't - it's bloody huge. We have auditlog as well, which > more that doubles its size. On the other hand, we have a perl program > which reads it in real time and emails us if it sees anything > unexpected) Lots of these: Feb 8 13:33:59 student2 cyr_expire[29242]: IOERROR: reading cache record for user.wooddm: got bogus offset 0 for 68/3; try reconstruct Feb 8 13:33:59 student2 cyr_expire[29242]: failure expiring user.wooddm: System I/O error > > Also, if it's segfaulting, you should be able to load the core file up > in gdb and get a backtrace - any chance of emailing that to the list > along with your imapd.conf and the exact version of cyrus that you're > running. I managed to "fix" things by lowering -X from 7 days to 1 day . For some reason cyr_expire ran without errors with -X1 /usr/local/cyrus/bin/cyr_expire -E 77 -X 3 -v -a -p 'user.wooddm' Which brings me to another question. Why does cyr_expire have to be run with -E ? I put it up to 77 when running cyr_expire manually just so it wouldn't spend 30 minutes cleaning up the duplicate delivery database. :-) When the corruption comes back I will attempt to get a backtrace from unexpunge, as it does segfault. Feb 8 09:56:31 student2 kernel: unexpunge[21258]: segfault at 7fa403d862dc ip 0000000000405d64 sp 00007fff9b1abd70 error 4 in unexpunge[400000+cb000] > > Thanks, > > Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html