Hello, this is a Cyrus 2.3.11 on Centos 5.
About 5000 users for 10 To.
Mail storage has been moved from NetApp NFS to FluidsFS (aka Dell
Compellent NFS).
Since an update on FluidFS, Imap spool undergoes daily NFS
timeouts which leads to corrupt mailboxes.
Typically, this begins with lines like this in /var/log/messages:
Dec 5 09:54:43 mailhost kernel: lockd: server 192.xxx.xx.xx not responding, timed out
Which is followed by IOERROR for accessed mailboxes during NFS
timeout:
Dec 5 09:54:47 mailhost lmtpunix[14542]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[21999]: IOERROR: locking header for user.xxxx.Sent: Input/output error
Dec 5 09:54:47 mailhost imaps[26935]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[24013]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[15672]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[3999]: IOERROR: locking index for user.xxxx: Input/output error
Dec 5 09:54:47 mailhost imaps[30671]: IOERROR: locking index for user.xxxx: Input/output error
...................
Around 15 maiboxes are corrupted at each timeouts.
Manually, we can repair this mailbox:
- first, we have to delete all cyrus files in mailbox, if not
the following reconstruct can be blocked
- then, we reconstruct the mailbox (reconstruct -s
user.<NAME>.<FOLDER>
The downside of this method is that all messages in the
reconstructed folder are marked 'Not seen'.
To automate this, a Python script has been written, but sometimes
not all cyrus files (cyrus.index) are recreated:
Dec 5 01:03:53 mailhost lmtpunix[497]: IOERROR: opening /var/spool/imap/x/user/xxxxxx/cyrus.index: No such file or directory
Timeouts happen about 3 times per day, and cyrus deliver process
is blocked when delivering to a corrupted mailbox.
So my first question is : how can we reconstruct a mailbox without
marking mails as not seen?
And my second question is : why cyrus files are not recreated
everytime? Is this due to the -s parameter with reconstruct?
Any help will be appreciated.
Thanks
------------------
Ismael TANGUY