2011/8/8 Székelyi Szabolcs <szekelyi@xxxxxxx>: > Hello, > > when I put my cluster under a little stress (doing performance measurements > with fio from one client), I see messages like this when watching the cluster > with ceph -w: > > My setup consists of three machines: > 1. iscsigw1: OSD+MDS+MON > 2. iscsigw2: OSD+MDS(standby-replay)+MON > 3. cc: MON+client+control utility It's not normal, precisely, but it's unlikely to be hurting anything. The monitors have to call sync() to save every map, so my guess is that the monitor on your 'cc' node, with the Ceph client, is simply taking forever on its sync calls since they try and flush out data over the network -- and that makes the other monitors think it's down. Then a new election is called, and since mon.0 (on 'cc') is still actually alive, it wins the election. Perhaps we should adjust the election code so that if there's a complain they don't resolve back to the same leader, although doing that and still ending up with a result quickly might take some doing. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html