Ok, so this isn't a memory leak as such, but... When sync_client has a large folder to send (for the sake of far too many hours of me trying to make this work let's just say it's 180,000 messages), then it just sends a single "UPLOAD [lastuid] [lastappenddate]" followed by every single message on after the other. There's logic on the server end to send a [RESTART] back after 1000 new files arrive, but it doesn't get to be called until all 180,000 messages have arrived... or at least it would be if the sync_server process didn't receive a SIGABRT somewhere around 102,000 messages in. I tried all sorts of things to find the underlying cause, then finally just watched 'top' on the sync_server machine as it ran. This machine has 8Gb of memory, and was seeing over 30% being used by this one sync_server before it died! Well, the attached isn't the most elegant patch in the world, and may not be the best way to solve the problem, but at least it got that user replicated and happy. The first time we had to deal with it was moving the user off a corrupted filesystem that I could only mount read-only, and it took about 3 hours for each run to fail thanks to the insanely high IO load on that drive unit, so debugging was more of a pain than you'd hope. I hope something inspired by this can be merged upstream to solve the "spam sync_server until it falls over" failure mode. Bron. -- Bron Gondwana brong@xxxxxxxxxxx
diff -ur --new-file cyrus-imapd-cvs/imap/sync_client.c cyrus-imapd-cvs.new/imap/sync_client.c --- cyrus-imapd-cvs/imap/sync_client.c 2006-08-26 10:48:27.000000000 -0400 +++ cyrus-imapd-cvs.new/imap/sync_client.c 2006-09-10 10:51:06.000000000 -0400 @@ -1198,7 +1198,7 @@ static int upload_messages_list(struct mailbox *mailbox, struct sync_msg_list *list) { - unsigned long msgno; + unsigned long msgno = 1; int r = 0; struct index_record record; struct sync_msg *msg; @@ -1212,8 +1212,11 @@ return(IMAP_IOERROR); } +repeatupload: + msg = list->head; - for (msgno = 1 ; msgno <= mailbox->exists ; msgno++) { + count = 0; + for (; count < 1000 && msgno <= mailbox->exists ; msgno++) { r = mailbox_read_index_record(mailbox, msgno, &record); if (r) { @@ -1272,6 +1275,12 @@ syslog(LOG_INFO, "UPLOAD: received RESTART"); } + /* don't overload the server with too many uploads at once! */ + if (count >= 1000) { + syslog(LOG_INFO, "UPLOAD: hit %d uploads at msgno %d", count, msgno); + goto repeatupload; + } + return(0); }
---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html