Am Donnerstag 30 November 2006 15:35 schrieb Jorey Bump: > Timo Veith wrote: > > I am still having the problem, isn't there anyone who has a hint for > > me? I changed the io scheduler from cfq to deadline, raised file > > descriptor limit to 300000 and still have no betterment. :( > > Just a thought, but can you try switching to a 2.4.x kernel? The 2.6 > series seems to suffer from gremlins like this once in a while. I am pretty sure that the master daemon is now running with a 350000 file desciptor limit. At least this is written to the log file. Dec 1 09:46:18 post master[3078]: setrlimit: Unable to set file descriptors limit to -1: Operation not permitted Dec 1 09:46:18 post master[3078]: retrying with 350000 (current max) Dec 1 09:46:18 post master[3078]: process started Because the timeouts still remain, I don't think that it is a file descriptor limit problem. 350000 should be way enough, shouldn't it? > > I installed the nagios check on the mail server itself to exclude > > network problems and checked the imap service on both interfaces > > (localhost and on the external ip). I also did this in parallel and > > noticed that when a timeout happens on one interface it is not > > constraining a timeout on the other interface, too. > > > > How can I tell why it sometimes takes so long until a imap process > > responds? > > I had a similar problem that seemed to disappear when I disabled IDLE > on the client, but does nagios use IDLE? When you look at the users > that are affected, does any particular client or setting stand out? Hmm, any clients that stand out ... most of the time it's squirrelmail, that takes so long until you are logged in. Sometimes squirrelmail even says "no connection to imap server". Maybe there is a timeout somewhere too. I haven't looked into the code of that nagios check though, but I don't think that it is using IDLE. It is just connecting to the imap port, waits for the server banner, disconnects and measures that time. I think it is pretty much the same as doing telnet 127.0.0.1 10143. Sometimes if I issue that command, I immediately get the service banner and sometimes only this Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. And I can wait and wait ... This is the point where I start wondering what the hell cyrus is doing now that it takes so long to answer. I started the master daemon with -D and export CYRUS_VERBOSE=1, but I saw no log messages that helped me. At least they don't sound critical to me. Is there anything I should be looking for? Oh and I tried it with the idle service disabled in cyrus.conf but it didn't make a difference. Isn't it enough to disable it there? Must I recompile it without the idled option? But I really would like to stay with idled enabled. Could it be that the compile time optimazations are to be blamed? This is what I have used for gcc (3.3.6): CFLAGS="-march=nocona -O3 -pipe -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse" Desperately Timo ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html