Re: mupdate cpu, thread timeouts

John Madden <jmadden@xxxxxxxxxxx> · Mon, 12 Jul 2010 15:49:58 -0400

On 07/12/2010 02:01 PM, Wesley Craig wrote:
> On 02 Jul 2010, at 09:29, John Madden wrote:
>> I'm concerned about the listener_lock timeouts.
>
> The listener_lock timeout means that the thread waited around for 60
> seconds to see if a connection was going to arrive.  Since it didn't,
> it timed out and that thread went away.  The pthread routine that you
> sit in for 60 seconds is pthread_cond_timedwait().  Perhaps your
> pthread implementation or kernel is implementing a busy wait?

...Meaning the error is nothing to worry about?  This is on RHEL 5.5 if 
that helps.

> I don't think the problem is adding the mailboxes, per se.  The only
> time a slave tries to resync is when the connection to the master is
> lost, or the slave THINKS the connection to the master is lost.  If
> the mupdate master is very busy doing something else and can't
> respond to NOOPs issued by mupdate slaves, then the slaves will
> consider the connection to be lost, drop the connection, and attempt
> to resync.  Since resyncing is a resource intensive activity (and
> single-threaded on the mupdate master, to boot), this resync can
> begin a thrashing cycle of dropped connections between the mupdate
> slaves and the master.  Bad news, and best avoided...

That seems consistent with what we're seeing.  Can any of this be 
tweaked to, for example, wait longer before thinking the connection to 
the master has been lost?

>> Can I do anything with the prefork parameter for mupdate to spread
>> things out on more cpu's or increase concurrency?
>
> Prefork doesn't do anything useful for mupdate -- it's about forking
> &  accepting connections, not about threads.  The mupdate master is
> multithreaded in many situations.  The mupdate slave on the frontends
> is almost never multithreaded, but it does share code with the
> mupdate master so you see messages about threads.  I suspect that
> mupdate on master&  slave are consuming 100% of CPU on one CPU
> because the slave is attempting to update.  That's a synchronous,
> single threaded activity on both, so I would expect it to take a lot
> of CPU and to only be on one CPU.

Gotcha, although that is a shame.  It might be nice to be able to 
re-sync a bunch of slaves while keeping changes locked, or even to do 
re-syncs with a MVCC sort of model where "here are the changes that were 
made during your resync" can be sent when re-sync finishes.  I could 
then throw 8 cores at the master and hopefully avoid these thrashing 
situations.

John

-- 
John Madden
Sr UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmadden@xxxxxxxxxxx
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html