Michael,
I'd like to thank you for having written up such a succinct and
reasonable description of a well thought out murder installation.
Lot's of good information here, especially for people who may be
considering a move like yours. This could be the bones of a good
Wiki article.
Cheers,
-nic
On 09/22/2014 06:20 AM, Michael Menge
wrote:
Hi,
3 weeks ago we changed our changed cyrus imap servers form stand
alone systems to a cyrus murder cluster. We have ~44000 accounts,
~457000 Mailboxes, and 2x6.5 TB Mails
In our previos setup we had 6 cyrus imap 2.4.17 servers running as
KVM
VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7
Blades).
Each server was running 2 cyrus instances one master system an one
replica
of one of the other servers. We used DNS cnames to distribute our
users to
our servers. The filesystems are stored on two Infortrend iSCSI
Raids, so
that the replic is not on the same iSCSI system as the master.
In our new setup each server is running 3 - 4 cyrus instances.
One Frontend, one backend, one replic and on one of the servers
the cyrus mupdate master. ClusterIP is used to distribute the
access
to our frontend instances. The backend and replics are only
listening
on private IPs.
If one server goes down, we will switch that ClusterIP bucket to
one
of the other servers, and we will restart the replic as backend by
changing
the config and switch the IP of the replic with the ip of the
backend. This
is much faster than updating the mailbox location of all the
affected
mailboxes.
If the mupdate master is down we start it on one of the other
servers,
using the mailboxdb of the frontend and running "ctl_mboxlist -m
-a"
on all backend instances.
Since the migration we discovered some small issues and some bugs.
1. usually Cyrus is not CPU bound. One exception is the mupdate
master
keeping encrypted connection to all frontends and establishing
new encrypted connections from the backend for every mailbox
creation,
rename and remove, was too much for the 4 cores so we added 4
additional
cores to the VMs.
2. Our frontend instances use IMAPs and POP3s and don't allow
STARTTLS.
But we hat to use IMAP and POP3 with STARTTLS on our backends,
as
the frontends will always use STARTTLS over IMAP and POP3 to
proxy
the connection.
3. We see more IOERRORs in our cyrus logs. In the standalone
cyrus imap IOERROR indicated a corruption in one of the cyrus
files
but that is not the case for the new errors we have found:
a) "reading message: unexpected end of file" as far as i can
tell,
this is triggert by the imap append command. I suspect when
the
connection between frontend and backend is lost or the
frontend
dies during upload of the message.
b) "opening index %s: Invalid mailbox name" the mailbox name
seem to
be fine in most cases. I haven only figured out why the
mailbox
name was considered invalid in one case (the Sting
"Posteingang"
was translated by the client and the name "INBOX" ins
reserved.
It would help if the String IOERROR would not be used in these
cases,
and if the mailbox name would always be logged consistent to
the
unixhierarchysep option.
4. Deleting an mailbox with delete_mode: delayed can create a
corrupt
mailbox in the DELETED tree. In the logs we found the
following:
be/beimap[62020]: Rename: user.LoginID.Mail.drafts ->
DELETED.user.LoginID.Mail.drafts.5416CD11
be/beimap[62020]: MUPDATE: can't commit mailbox entry for
'DELETED.user.LoginID.Mail.drafts.5416CD11'
be/beimap[62020]: Deleted mailbox
DELETED.user.LoginID.Mail.drafts.5416CD11
and on the next cyr_expire run
be/cyr_expire[144388]: IOERROR: opening index
DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error
in the filesystem DELETED/user/LoginID/Mail/drafts was an empty
directory.
I couldn't find any hints why the mupdate master couldn't
commit the
mailbox entry, but as "5416CD11" is the timestamp of the
action, I am
certain that the mailbox did not exist in the mailboxdb before.
And as
this only happens in some rare cases I suspect a race
condition.
5. Some frontend imapd processes receive a SIGSEGV.
As this seams to happen in the libopenssl I asked on their
mailinglist,
but didn't receive an answer jet. At the end you will fine an
BT of the
core dump.
I would be glad if changes regarding the logging of IOERRORs
and mailbox names would be included in Cyrus 2.5
Regarding 4. and 5. are these known bugs? I could not find any
matching
entries in the bug tracker. If they are not know I would add them
to the bug tracker.
Regards
Michael menge
----- ldd imapd ----
linux-vdso.so.1 => (0x00007fff3ffed000)
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000)
libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8
(0x00007f40e6052000)
libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8
(0x00007f40e5cb2000)
libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000)
libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000)
libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000)
--- bt on imapd core dump ----
#0 0x000000000080e130 in ?? ()
#1 0x00007fe5a839334f in ssl3_get_message (s=0x80e430,
st1=8347825, stn=-1470427072, mt=<optimized out>,
max=102400, ok=0x7fffcc974d08)
at s3_both.c:522
#2 0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at
s3_clnt.c:1103
#3 0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at
s3_clnt.c:316
#4 0x000000000046a177 in tls_start_clienttls (readfd=16,
writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108,
ret=0x7e1fa0,
sess=0x7e1fa8) at tls.c:1311
#5 0x00000000004669f4 in do_starttls (s=0x7e16a0,
tls_cmd=0x78a4d0 <imap_protocol+208>) at backend.c:201
#6 0x0000000000467217 in backend_authenticate (s=0x7e16a0,
prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468,
userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30,
status=0x7fffcc976460) at backend.c:378
#7 0x0000000000467a1a in backend_connect
(ret_backend=0x7e16a0, server=0x7a8960 <partition.17660>
"ma03.mail.localhost",
prot=0x78a400 <imap_protocol>, userid=0x7f5c90
"REPLACED_LOGINID", cb=0x0, auth_status=0x0) at backend.c:552
#8 0x0000000000426603 in proxy_findserver (server=0x7a8960
<partition.17660> "ma03.mail.localhost", prot=0x78a400
<imap_protocol>,
userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010
<backend_cached>, current=0x7a3008 <backend_current>,
inbox=0x7a3000 <backend_inbox>,
clientin=0x7be450) at proxy.c:179
#9 0x0000000000426beb in proxy_findinboxserver
(userid=0x7f5b20 "REPLACED_LOGINID") at imap_proxy.c:145
#10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117",
listargs=0x7fffcc977510) at imapd.c:6036
#11 0x000000000040c9ee in cmdloop () at imapd.c:1574
#12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010,
envp=0x7fffcc97b650) at imapd.c:946
#13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618,
envp=0x7fffcc97b650) at service.c:582
-----------------------------
--------------------------------------------------------------------------------
M.Menge Tel.: (49) 7071/29-70316
Universität Tübingen Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail:
michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76
72074 Tübingen
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Nic Bernstein nic@xxxxxxxxxxx
Onlight, Inc. www.onlight.com
219 N. Milwaukee St., Suite 2a v. 414.272.4477
Milwaukee, Wisconsin 53202
|