Hello, this is a Cyrus 2.3.11 on Centos 5.
About 5000 users for 10 To.
Mail storage has been moved from NetApp NFS to FluidsFS
(aka Dell Compellent NFS).
Since an update on FluidFS, Imap spool undergoes daily
NFS timeouts which leads to corrupt mailboxes.
Typically, this begins with lines like this in
/var/log/messages:
Dec 5 09:54:43 mailhost kernel: lockd: server 192.xxx.xx.xx not responding, timed out
Which is followed by IOERROR for accessed mailboxes
during NFS timeout:
Dec 5 09:54:47 mailhost lmtpunix[14542]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[21999]: IOERROR: locking header for user.xxxx.Sent: Input/output error
Dec 5 09:54:47 mailhost imaps[26935]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[24013]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[15672]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[3999]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[30671]: IOERROR: locking index for user.xxxx: Input/output error
...................
Around 15 maiboxes are corrupted at each timeouts.
Manually, we can repair this mailbox:
- first, we have to delete all cyrus files in mailbox,
if not the following reconstruct can be blocked
- then, we reconstruct the mailbox (reconstruct -s
user.<NAME>.<FOLDER>
The downside of this method is that all messages in the
reconstructed folder are marked 'Not seen'.
To automate this, a Python script has been written, but
sometimes not all cyrus files (cyrus.index) are
recreated:
Dec 5 01:03:53 mailhost lmtpunix[497]: IOERROR: opening /var/spool/imap/x/user/xxxxxx/cyrus.index: No such file or directory
Timeouts happen about 3 times per day, and cyrus
deliver process is blocked when delivering to a
corrupted mailbox.
So my first question is : how can we reconstruct a
mailbox without marking mails as not seen?
And my second question is : why cyrus files are not
recreated everytime? Is this due to the -s parameter
with reconstruct?
Any help will be appreciated.
Thanks
------------------
Ismael TANGUY