Re: Monitors calling for elections all the time under load

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Mon, 8 Aug 2011 08:22:53 -0700

2011/8/8 Székelyi Szabolcs <szekelyi@xxxxxxx>:
> Hello,
>
> when I put my cluster under a little stress (doing performance measurements
> with fio from one client), I see messages like this when watching the cluster
> with ceph -w:
>
> My setup consists of three machines:
> 1. iscsigw1: OSD+MDS+MON
> 2. iscsigw2: OSD+MDS(standby-replay)+MON
> 3. cc: MON+client+control utility

It's not normal, precisely, but it's unlikely to be hurting anything.
The monitors have to call sync() to save every map, so my guess is
that the monitor on your 'cc' node, with the Ceph client, is simply
taking forever on its sync calls since they try and flush out data
over the network -- and that makes the other monitors think it's down.
Then a new election is called, and since mon.0 (on 'cc') is still
actually alive, it wins the election.

Perhaps we should adjust the election code so that if there's a
complain they don't resolve back to the same leader, although doing
that and still ending up with a result quickly might take some doing.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html