Hi, Paul Dekkers wrote: > Simon Matter wrote: >>> Paul Dekkers wrote: >>> >>>> I finally found a moment for upgrading my 2.3.9 install (using Simon's >>>> RPMs on Red Hat 4.6, 64-bit) to 2.3.11-3 (leaving the config files >>>> untouched), after which it seems that replication isn't working properly >>>> anymore. >>> While it seems to be only replication for now that fails me; is it >>> possible to revert to the previous version? (While that implies for me >>> that I'll have to rpm -e and install the previous rpm, I suppose.) >>> Not sure if I'd like that, but I really really liked my replication >>> running. >> You could do that with "rpm -Uvh --oldpackage ...". > > Ah, thanks for that. I might do that if I can't get it to work soon, > > (Judging from the changes/upgrade notes I guess nothing dramatically or > irreversibly changed in any of the databases/formats, I didn't touch the > GUID bits yet - so I guess I should be fine there downgrading.) I reverted to the old packages now. And although regular operation is (again) fine, I do have some problems with replication, and/or my replica, although at first the downgrade seemed to solve everything. After the downgrade I was pleasantly surprised that I could login in on the replica (just started imapd on the replica to test this, this was my "downgrade test") before a manual user.paul replication from the master. After this sync, I was unable to SELECT my INBOX on the replica (everything OK on the master). A reconstruct of my user.paul on the replica solved this. While strace-ing it appears that imapd hung on the seen file; either during the mmap of it, or the fcntl, but I'm not convinced that this file was faulty - I could easily cvt't it to flat and back to skiplist without solving the issue (or other errors). (This could be completely unrelated; I've seen this before, where imapd hung and consumed 100% CPU until the folder was reconstructed.) But: It seems now that every folder that was successfully synced during 2.3.11 now needs a reconstruct on the replica. The replica logs: syncserver[19616]: cmd_status_work_sub(): UIDs out of order! last message repeated 304 times master[19519]: process 19616 exited, signaled to death by 7 master[19519]: service syncserver pid 19616 in BUSY state: terminated abnormally Fortunately, I see no such errors on my master. But - well - with 2.3.11 on both systems it seemed that I had to reconstruct on all folders on my master, and now I downgraded to 2.3.9 it seems that I need to reconstruct all (touched) folders on my replica. (At least that does not consume user CPU, of course.) I'm not 100% sure if I'm better off then before the downgrade. I'll find out after reconstructing some more users I suppose (which takes ages). Any clues/suggestions are welcome :-) Paul >>>> If I run the sync_client, just a simple -u paul, I see in my logs: >>>> >>>> sync_client[18493]: SETMODSEQ received BAD response: Syntax error in >>>> Setflags: Invalid modseq >>>> sync_client[18493]: Error in do_user(paul): bailing out! >>>> >>>> Before the upgrade, I'm sure replication was working properly. I >>>> checked, both servers are really running the same versions of >>>> everything. >>>> >>>> I noticed that if I strace the sync_client, the folder on which it bails >>>> out is always the same. If I reconstruct that folder, and re-run (or >>>> just the mailbox), the process continues (up to the next folder that >>>> causes the thing to bail out - although it doesn't bail out on every >>>> folder). >>>> >>>> There were more strange log-items related to the sync_client; >>>> >>>> sync_client[18232]: USER: Invalid type 1 response from server >>>> sync_client[18232]: Discarding: 0000000000000000000000000000000000000000 >>>> () >>>> sync_client[18232]: Discarding: 2 0 >>>> 0000000000000000000000000000000000000000 () >>>> sync_client[18232]: Discarding: 3 0 >>>> 0000000000000000000000000000000000000000 () >>>> sync_client[18232]: Discarding: 4 0 >>>> 0000000000000000000000000000000000000000 (\answered) >>>> sync_client[18232]: Discarding: 5 0 >>>> 0000000000000000000000000000000000000000 () >>>> >>>> and a bunch more, like: >>>> >>>> sync_client[18232]: Discarding: archief.thuispc >>>> ... >>>> sync_client[18232]: sync_eatlines_unsolicited(): resynchronised okay >>>> ... >>>> sync_client[18232]: Processing sync log file >>>> /data/config/imap/sync/log-18231 failed: Bad protocol >>>> sync_client[18231]: process 18232 exited, status 1 >>>> >>>> Any clue why replication stopped working properly for me after the >>>> upgrade? >>> There is more sync-related uglyness in my logging; while I suppose this >>> is the most harmless one: >>> >>> sync_client[19532]: Hit upload limit 0 at UID 180958 for user.paul.Junk, >>> sending >>> >>> ... I don't recall seing it before. (And a limit of 0?!) >>> >>> What is worse, is that sync_client now also segfaults on the >>> rolling-log, as soon as I start a sync_client -v -r -f log, >>> >>> MAILBOXES user.henny user.henny.Email lists.IETF-announce >>> user.paul.Drafts archief.netmaster.spam user.elise >>> Segmentation fault >>> >>> And my kernel logs that as: >>> >>> sync_client[18881]: segfault at 0000000000000000 rip 0000002a96054a30 >>> rsp 0000007fbfffda08 error 4 >>> >>> ... unfortunately, the sync-log is only getting bigger, and I didn't >>> realize that running a sync_client -r -f log would take that much IO and >>> CPU (or that is something that changed in this version too). >>> >>> Somehow I'm not sure if running a reconstruct on all mailboxes is an >>> option, it would also take a huge amount of time. But somehow I don't >>> think it makes sense. >>> >>> I'll include my imapd.conf below, in case that is useful. >>> >>> Paul >>> >>> P.S. Hmm, and I intentionally skipped 2.3.10 as I believe that people >>> were having problems with that, and waited a bit with 2.3.11 :-S >> I'm not using replication but IIRC there were some changes between 2.3.9 >> and 2.3.11 which have to be addressed when using replication. Did you >> carefully check the upgrade instructions? Maybe there is something you >> have to do. > > I did have a look at that; but I'm afraid there's nothing in there that > I missed; didn't touch the GUIDs (and your RPM leaves guid_mode default, > which is "off"), there are a couple of changes in replication that might > just be the cause of my problems, but it's not clearly related I'm > afraid. (Or at least there's nothing I didn't do that I should have done.) > > I actually run replication with 2.3.11 on a different machine without > problems, but that's a small setup and on FreeBSD instead of Red Hat. > But I know what differences there are with the RPM, the manual is very > helpful with that, so I don't expect anything RPM-specific. (And there > was actually a fix for delayed delete in 2.3.11 in combination with > replication, so even if the invoca RPM has delayed delete by default > enabled I think it should work.) > >> Another note: Be aware that the invoca rpm has some changed defaults for >> imapd.conf (which is stated in the manpage). Now, if one feature doesn't >> play nice with replication, this won't disturb other people who don't have >> those options enabled. Options that come to mind are: >> >> delete_mode: delayed >> expunge_mode: delayed >> flushseenstate: 1 > > The delete_mode is indeed new; I didn't change the toggle there while I > did for expunge_mode, but now that I put it back to "immediate" it > doesn't help me either. (And I actually found that now I have a folder > "DELETED" that also got replicated (before it crashed again) ;-) but to > a different partition actually then on my master, surprisingly. Oh well.) > >> Sorry if it doesn't really help. > > Well, thanks for replying! > (The suggestion how to revert to the previous Cyrus is useful, and I'm > afraid I'll need it.) > > Paul > > >> Simon >> >>> My imapd.conf on the master: >>> >>> configdirectory: /data/config/imap >>> defaultpartition: imap4 >>> partition-imap1: /data/imap1 >>> partition-imap2: /data/imap2 >>> partition-imap3: /data/imap3 >>> partition-imap4: /data/imap4 >>> sievedir: /data/config/sieve >>> hashimapspool: false >>> >>> md5_dir: /data/config/md5 >>> >>> allowanonymouslogin: no >>> allowplaintext: yes >>> plaintextloginpause: 0 >>> admins: cyrus >>> sasl_pwcheck_method: saslauthd >>> sasl_mech_list: PLAIN LOGIN >>> #sasl_pwcheck_method: auxprop >>> >>> duplicatesuppression: 1 >>> quotawarn: 90 >>> postuser: shared >>> lmtp_downcase_rcpt: yes >>> username_tolower: yes >>> >>> sieveuserhomedir: false >>> unix_group_enable: 1 >>> >>> sync_host: ... >>> sync_authname: cyrus >>> sync_password: ... >>> >>> sync_machineid: 2 >>> sync_log: true >>> >>> # default invoca-rpm db definitions on this machine! >>> ## explicit database definitions (from the past) >>> ##duplicate_db: skiplist >>> ## deliver.db: Berkeley DB (Btree, version 8, native byte-order) >>> #duplicate_db: berkeley >>> #mboxlist_db: skiplist >>> ## mailbox keys? >>> #mboxkey_db: skiplist >>> #seenstate_db: skiplist >>> #subscription_db: flat >>> ##tlscache_db: skiplist >>> ## tls_sessions.db: Berkeley DB (Btree, version 8, native byte-order) >>> #tlscache_db: berkeley >>> #annotation_db: skiplist >>> ##ptscache_db: skiplist >>> #ptscache_db: berkeley >>> #quota_db: quotalegacy >>> >>> # without this, I got errors in my test-setup using the dovecot imaptest >>> expunge_mode: immediate >>> >>> ---- >>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/ >>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki >>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html >>> >> > > ---- > Cyrus Home Page: http://cyrusimap.web.cmu.edu/ > Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html