Re: sync_server "memory leak" with giant new mailbox first sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I saw this problem the first time I enabled replication on a machine hosting all the IT support staff the University of Michigan. Plenty of large mailboxes there!

My solution (such as it is) was to reduce the wasteful amount of space sync_server was allocating per message:

--- cyrus-imapd-2.3.7/imap/sync_support.c 2006-06-14 14:03:24.000000000 -0400 +++ cyrus-imapd-2.3.7p3/imap/sync_support.c 2006-07-29 12:34:59.000000000 -0400
@@ -912,9 +912,9 @@
     result = xzmalloc(sizeof(struct sync_message));
     message_uuid_set_null(&result->uuid);

- result->msg_path = xzmalloc(5 * (MAX_MAILBOX_PATH+1) * sizeof (char));
+    result->msg_path = xzmalloc((MAX_MAILBOX_PATH+1) * sizeof(char));
     result->msg_path_end = result->msg_path +
-	5 * (MAX_MAILBOX_PATH+1) * sizeof(char);
+	(MAX_MAILBOX_PATH+1) * sizeof(char);

snprintf(result->stagename, sizeof(result->stagename), "%lu.", l->count);

The times-5 is completely gratuitous. In fact the pre-allocation of any memory for paths is wasteful, but I was not up for reengineering the memory scheme in sync_server at the time. To solve the 1000- messages->RESTART transition, I wonder if the client couldn't just initiate the transition. After all, how smart is it to transmit 1000 messages before deciding that a more efficient approach might be needed? Especially since sync_client has an idea of how many messages it's going to attempt to send.

:wes

On 10 Sep 2006, at 11:15, Bron Gondwana wrote:
When sync_client has a large folder to send (for
the sake of far too many hours of me trying to
make this work let's just say it's 180,000
messages), then it just sends a single
"UPLOAD [lastuid] [lastappenddate]" followed by
every single message on after the other.

There's logic on the server end to send a [RESTART]
back after 1000 new files arrive, but it doesn't
get to be called until all 180,000 messages have
arrived... or at least it would be if the sync_server
process didn't receive a SIGABRT somewhere around
102,000 messages in.  I tried all sorts of things
to find the underlying cause, then finally just
watched 'top' on the sync_server machine as it ran.

This machine has 8Gb of memory, and was seeing over
30% being used by this one sync_server before it
died!

Well, the attached isn't the most elegant patch in
the world, and may not be the best way to solve the
problem, but at least it got that user replicated
and happy.  The first time we had to deal with it
was moving the user off a corrupted filesystem that
I could only mount read-only, and it took about 3
hours for each run to fail thanks to the insanely
high IO load on that drive unit, so debugging was
more of a pain than you'd hope.

I hope something inspired by this can be merged
upstream to solve the "spam sync_server until it
falls over" failure mode.
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux