SUMMARY: Attempting to upgrade Ubuntu results in 100% CPU tight-loops, not in a system call, maybe somewhere in Berkeley DB. Blowing away the db dir works, and I don't think there was anything important there, but what happened? DETAILS: I've just upgraded from Ubuntu Dapper Drake 32-bit to Ubuntu Gutsy Gibbon 64-bit. I *think* I was running a source-built Cyrus IMAPD 2.2.13. I had repointed my /var/lib/imap to /mail/imap, and was using BerkeleyDB 4.3 for anything that still used BDB. I do know that checkpointing hasn't run for a very, very long time, because when I switched from the packaged version to the source-built one, I forgot to update the path for ctl_cyrusdb in cyrusd.conf. Oops. So, back on Gutsy: I installed Cyrus from the Ubuntu packages; it says it's 2.2.13-11ubuntu1. I also installed libdb4.3. Started up cyrus, and "ctl_cyrusdb -r" is looping at 100% CPU, and has to be kill -9. db_recover, db_verify, etc. all had the same symptoms (but see below). According to strace, ctl_cyrusdb was not opening *any* of the .db files; it's looking at: 1. itself 2. some libraries 3. imapd.conf 4. /mail/imap/DB_CONFIG (doesn't exist) 5. /var/tmp 6. /mail/imap/db/__db.001 When I look in /mail/imap/db, I see: 2006-09-25 13:14 log.0000000050.old 2006-09-26 18:01 skipstamp -- 2006-09-26 18:01 __db.001 through 2006-09-26 18:01 __db.005 -- 2007-06-17 12:16 log.0000000060 through 2008-03-05 13:49 log.0000000076 So as I understand it, this isn't "the" database; it's BDB transactions and/or log files for ALL of the various databases: /mail/imap# file *.db annotations.db: Cyrus skiplist DB deliver.db: Berkeley DB (Btree, version 8, native byte-order) mailboxes.db: Cyrus skiplist DB tls_sessions.db: Berkeley DB (Btree, version 8, native byte-order) Not understanding that, I was trying to db4.3_recover the db/ directory itself, and saw the same symptoms: db_recover would open and mmap the __db.001 file, and then completely hang and need a kill -9 to go away. Same for db_stat, db_verify, db_dump, db_printlog. (Hey, maybe that's a clue: Is it trying to open a 32-bit shared-memory region on a 64-bit OS, or something like that?) Figuring that the db/ directory would contain nothing but deliver.db and tls_sessions.db data, and that (from what I read) neither are important state info for a one-user mail system, I just blew away the db/ directory and deliver.db. (I didn't think to blow away tls_sessions.db but it seems happy enough now.) So everything *seems* to be working OK now, but I don't quite understand what happened, and what I was supposed to do to "fix" it more properly (other than having run checkpointing in the first place). If I go back to a freshly-restored backup, understanding what the different DBs are now, I still see weird behavior: /tmp/berkeley-recover# ls annotations.db db/ deliver.db log.0000000001 mailboxes.db tls_sessions.db /tmp/berkeley-recover# db4.3_recover [returns instantly] /tmp/berkeley-recover# db4.3_verify deliver.db db_verify: Page 239: incorrect next_pgno 244 found in leaf chain (should be 60) db_verify: Page 60: incorrect prev_pgno 44 found in leaf chain (should be 239) ... [many more linking errors] ... db_verify: Page 0: page 250 encountered a second time on free list db_verify: deliver.db: DB_VERIFY_BAD: Database verification failed /tmp/berkeley-recover# db4.3_recover -c [returns instantly] /tmp/berkeley-recover# db4.3_verify -N deliver.db [same results as before] Can anyone give me more insights into what I'm seeing, so I know better next time? ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html