Hi, 3 weeks ago we changed our changed cyrus imap servers form stand alone systems to a cyrus murder cluster. We have ~44000 accounts, ~457000 Mailboxes, and 2x6.5 TB Mails In our previos setup we had 6 cyrus imap 2.4.17 servers running as KVM VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7 Blades). Each server was running 2 cyrus instances one master system an one replica of one of the other servers. We used DNS cnames to distribute our users to our servers. The filesystems are stored on two Infortrend iSCSI Raids, so that the replic is not on the same iSCSI system as the master. In our new setup each server is running 3 - 4 cyrus instances. One Frontend, one backend, one replic and on one of the servers the cyrus mupdate master. ClusterIP is used to distribute the access to our frontend instances. The backend and replics are only listening on private IPs. If one server goes down, we will switch that ClusterIP bucket to one of the other servers, and we will restart the replic as backend by changing the config and switch the IP of the replic with the ip of the backend. This is much faster than updating the mailbox location of all the affected mailboxes. If the mupdate master is down we start it on one of the other servers, using the mailboxdb of the frontend and running "ctl_mboxlist -m -a" on all backend instances. Since the migration we discovered some small issues and some bugs. 1. usually Cyrus is not CPU bound. One exception is the mupdate master keeping encrypted connection to all frontends and establishing new encrypted connections from the backend for every mailbox creation, rename and remove, was too much for the 4 cores so we added 4 additional cores to the VMs. 2. Our frontend instances use IMAPs and POP3s and don't allow STARTTLS. But we hat to use IMAP and POP3 with STARTTLS on our backends, as the frontends will always use STARTTLS over IMAP and POP3 to proxy the connection. 3. We see more IOERRORs in our cyrus logs. In the standalone cyrus imap IOERROR indicated a corruption in one of the cyrus files but that is not the case for the new errors we have found: a) "reading message: unexpected end of file" as far as i can tell, this is triggert by the imap append command. I suspect when the connection between frontend and backend is lost or the frontend dies during upload of the message. b) "opening index %s: Invalid mailbox name" the mailbox name seem to be fine in most cases. I haven only figured out why the mailbox name was considered invalid in one case (the Sting "Posteingang" was translated by the client and the name "INBOX" ins reserved. It would help if the String IOERROR would not be used in these cases, and if the mailbox name would always be logged consistent to the unixhierarchysep option. 4. Deleting an mailbox with delete_mode: delayed can create a corrupt mailbox in the DELETED tree. In the logs we found the following:be/beimap[62020]: Rename: user.LoginID.Mail.drafts -> DELETED.user.LoginID.Mail.drafts.5416CD11
be/beimap[62020]: MUPDATE: can't commit mailbox entry for 'DELETED.user.LoginID.Mail.drafts.5416CD11'
be/beimap[62020]: Deleted mailbox DELETED.user.LoginID.Mail.drafts.5416CD11 and on the next cyr_expire runbe/cyr_expire[144388]: IOERROR: opening index DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error
in the filesystem DELETED/user/LoginID/Mail/drafts was an empty directory. I couldn't find any hints why the mupdate master couldn't commit the mailbox entry, but as "5416CD11" is the timestamp of the action, I am certain that the mailbox did not exist in the mailboxdb before. And as this only happens in some rare cases I suspect a race condition. 5. Some frontend imapd processes receive a SIGSEGV. As this seams to happen in the libopenssl I asked on their mailinglist, but didn't receive an answer jet. At the end you will fine an BT of the core dump. I would be glad if changes regarding the logging of IOERRORs and mailbox names would be included in Cyrus 2.5 Regarding 4. and 5. are these known bugs? I could not find any matchingentries in the bug tracker. If they are not know I would add them to the bug tracker.
Regards Michael menge ----- ldd imapd ---- linux-vdso.so.1 => (0x00007fff3ffed000) libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000) libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8 (0x00007f40e6052000) libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00007f40e5cb2000) libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000) libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000) libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000) libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000) /lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000) --- bt on imapd core dump ---- #0 0x000000000080e130 in ?? ()#1 0x00007fe5a839334f in ssl3_get_message (s=0x80e430, st1=8347825, stn=-1470427072, mt=<optimized out>, max=102400, ok=0x7fffcc974d08)
at s3_both.c:522 #2 0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at s3_clnt.c:1103 #3 0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at s3_clnt.c:316#4 0x000000000046a177 in tls_start_clienttls (readfd=16, writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108, ret=0x7e1fa0,
sess=0x7e1fa8) at tls.c:1311#5 0x00000000004669f4 in do_starttls (s=0x7e16a0, tls_cmd=0x78a4d0 <imap_protocol+208>) at backend.c:201 #6 0x0000000000467217 in backend_authenticate (s=0x7e16a0, prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468, userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30, status=0x7fffcc976460) at backend.c:378 #7 0x0000000000467a1a in backend_connect (ret_backend=0x7e16a0, server=0x7a8960 <partition.17660> "ma03.mail.localhost", prot=0x78a400 <imap_protocol>, userid=0x7f5c90 "REPLACED_LOGINID", cb=0x0, auth_status=0x0) at backend.c:552 #8 0x0000000000426603 in proxy_findserver (server=0x7a8960 <partition.17660> "ma03.mail.localhost", prot=0x78a400 <imap_protocol>, userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010 <backend_cached>, current=0x7a3008 <backend_current>, inbox=0x7a3000 <backend_inbox>,
clientin=0x7be450) at proxy.c:179#9 0x0000000000426beb in proxy_findinboxserver (userid=0x7f5b20 "REPLACED_LOGINID") at imap_proxy.c:145 #10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117", listargs=0x7fffcc977510) at imapd.c:6036
#11 0x000000000040c9ee in cmdloop () at imapd.c:1574#12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010, envp=0x7fffcc97b650) at imapd.c:946 #13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618, envp=0x7fffcc97b650) at service.c:582
----------------------------- -------------------------------------------------------------------------------- M.Menge Tel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912Zentrum für Datenverarbeitung mail: michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76 72074 Tübingen
Attachment:
smime.p7s
Description: S/MIME Signatur
---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus