I have been looking deeply at this, at reap_child() and process_msg() and this seems to be fine... so my suspect is just noise.... sorry for it...
I'll go on trying to reproduce it.... or try enabling debug to 1 in imapd.conf with a HUP signal when I have the problem.... if the suspect I had that ready_works was not totally right decremented I suppose (should look more deeply) that enabling debug in that moment would allow me (to see ready_workers value because I see are logged with debug in master.c in some place...)...
Good morning,
I wrote to the list some time ago due to this topic. I have noted that when a user asks to disconnect his/her sessions (imap sessions normally the disconnected ones) after some time like an hour or a half an hour passes I start having some slow responses to new connections. I have monitored them and it's slightly intermittent, KO and OK of server responding alert in less than 3 seconds. This intermittency lasts like 10 minutes and later the response delays increase. Obviously for avoid bigger service outages I stop and start Cyrus and all works fine later.
I have been trying to figure why it could be happening because I have seen it only happens when I launch TERM for a disconnection to several imap proccesses (the user that requested proccesses). It doesn't happen at the moment. As said for instance yesterday happened when an hour passed of the disconnection.
I don't have prefork param set in cyrus.conf, so... Cyrus spawns on demand. After doing some examinations my theory is that perhaps Cyrus is not calling spawn_service() every time gets needed. I would say it should happen in master.c :
2610 if (!in_shutdown && Services[i].exec &&
2611 Services[i].nactive < Services[i].max_workers &&
2612 Services[i].ready_workers == 0 &&
2613 y >= 0 && FD_ISSET(y, &rfds))
2614 {
2615 /* huh, someone wants to talk to us */
2616 spawn_service(i);
2617 }
I think that perhaps ready_workers is not properly decremented when I launch this TERM and perhaps this causes Cyrus not to spawn any new more services of the kind of the terminated service. As yesterday happened in a more or less peak accesses hour it needed to have more services spawned but... as it seen ready_workers not to be 0 for that service... it didn't spawn new more services and as consequence, connections started becoming queued in being accepeted awaiting that happened when a proccess gets idle because it's client has disconnected or idled timeout or imap timeout happened....
So, I was wondering if in reap_child() for the states SERVICE_STATE_UNKNOWN and SERVICE_STATE_BUSY shouldn't be decremented the ready_workers in the service struct... Concretely in master.c in :
1121 case SERVICE_STATE_BUSY:
1122 s->nactive--;
1123 if (!in_shutdown && failed) {
1124 syslog(LOG_DEBUG,
1125 "service %s/%s pid %d in BUSY state: "
1126 "terminated abnormally",
1127 SERVICEPARAM(s->name),
1128 SERVICEPARAM(s->familyname), pid);
1129 }
1130 break;
1131
1132 case SERVICE_STATE_UNKNOWN:
1133 s->nactive--;
1134 syslog(LOG_WARNING,
1135 "service %s/%s pid %d in UNKNOWN state: exited",
1136 SERVICEPARAM(s->name),
1137 SERVICEPARAM(s->familyname), pid);
Obviously, another possible solution is to specify prefork in cyrus.conf but I'd rather avoid wasting memory when it's not needed...
What's your opinion mates?. Or what do you think Ellie, Bron :) . Have you ever seen something like it?.
Cheers!