Re: intermittent problems with kick_mupdate

Gavin Gray <gavin.gray@xxxxxxxx> · Fri, 04 Jun 2010 09:14:51 +0100

Yes, all that you're saying makes sense in terms of what we're experiencing,

Quoting Wesley Craig <wes@xxxxxxxxx>:

> On 03 Jun 2010, at 09:24, Gavin Gray wrote:
>> imap[29289]: [ID 886451 local6.error] Could not trigger remote push to
>> mupdate serverduring move of user...
>
> During the xfer, the local backend sets some information in the mupdate
> master WRT the new mailbox location.  However, this information may be
> a bit of a guess.  The MUPDATEPUSH instructs the remote backend to set
> whatever the correct information is in the mupdate master.  Stupidly,
> the above log doesn't report what the problem was, and neither does the
> remote backend.  That should be fixed (can you open a bug report?).
> However, the error is not fatal.

This error does not interrupt xfers and cause them to fail.

>> imap[22505]: [ID 772019 local6.error] Could not set remote acl on user....
>
> This error is fatal.  In fact, you ought to not execute the following
> MUPDATEPUSH, because not being able to set the ACL is not permissible.
> Perhaps you're seeing this problem:
>
> 	https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=3218
>
> Of course, the logging again fails to tell us *why* we aren't able to
> set the remote ACL (another good opportunity to report a bug).

This error does cause xfers to halt and fail. Although we can easily  
restart them and they pick up from where they left off and complete  
successfully.

>> The error on the new backend receiving the error is:
>> kick_mupdate: can't connect to target: No such file or directory
>
>
> Is this a unified murder?  Skimming imapd.c, I see a mix of calls to
> kick_mupdate(), some protected by checks for the type of murder, some
> not.  Perhaps that's the problem.  In any case, kick_mupdate() is void,
> so errors relating to it are probably cascades from some other failed
> step in the process.

It's not a unified murder, we have a number of frontends and a  
dedicated mupdate master. As for kick_mupdate we see lots of those  
errors say over the course of the night when the xfers are taking  
place ,but perhaps only one instance of a xfer failing, which is  
matched by exactly one of these entries in our logs:

Jun  3 01:00:23 backend machine imap[2554]: [ID 130975 local6.error]  
connect(mupdate master machine) failed: Connection refused
Jun  3 01:00:23 backend machine imap[2554]: [ID 320383 local6.error]  
mupdate_connect failed: unknown error

The logs then go on to say some operation couldn't happen because of  
not being able to connect to the mupdate server, however the error  
reported by the failed xfer process is always the "Could not set  
remote acl on user" one.

I don't really understand what kick_mupdate does or how it does it  
from having a quick look at the code so I feel at a bit of a loss there.

Is their any possibility that with lots of concurrent xfers going on  
we could be hitting some limit on how many connections the mupdate  
master process will accept?

I'll look at the bug report you mentioned and think about submitting  
one regarding the error logging,

many thanks,

Gavin Gray

-- 
Gavin Gray
Edinburgh University Information Services
Rm 2013 JCMB
Kings Buildings
Edinburgh
EH9 3JZ
UK
tel +44 (0)131 650 5987
email gavin.gray@xxxxxxxx

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html