Re: many report failed after mon election

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I think I found what trigger mon reelection.
It is system disk load (logrotate, and other ...), I have mon
directory on this disk aslo.

Regards
Dominik

2013/9/13 Gregory Farnum <greg@xxxxxxxxxxx>:
> I believe that's too high of an allowed skew with the default lease etc
> settings. The actual complaint is "I got a lease which has ALREADY expired
> and can't do anything with that!"
>
> You'll need to either get your clock skew down to less than, say, 1/4 second
> (which is perfectly doable over three nodes with ntp), or go through and
> change all of the monitor timing configurables appropriately for that skew.
> I don't remember all the constraints you'll need to satisfy when doing that
> so I really recommend the first option.
> -Greg
>
> On Friday, September 13, 2013, Dominik Mostowiec wrote:
>>
>> Hi,
>> I have ntpd installed on servers, time seems to be ok.
>>
>> I have strange log:
>> 2013-09-12 07:34:40.238659 7fd63ac3e700 -1
>> mon.4@3(peon).p0.075434axos(auth active c 581328..581348) lease_expire
>> from mon.0 10.177.64.4:6789/0 is  seconds in the past; mons are laggy
>> or clocks are too skewed
>>
>> But value 0.075434 is small.
>>
>> In ceph.conf i have:
>>         mon allowed clock drift = 2
>>
>> Some time mons reports:
>> 2013-09-13 00:11:14.556410 7fd63ac3e700  0 log [INF] : mon.4 calling
>> new monitor election
>> 2013-09-13 00:11:14.557306 7fd6317b8700  0 -- 10.177.64.7:6789/0 >>
>> 10.177.64.5:6789/0 pipe(0xdbc2000 sd=18 :6789 s=0 pgs=0 cs=0
>> l=0).accept connect_seq 112 vs existing 112 state connecting
>> 2013-09-13 00:11:14.557374 7fd638525700  0 -- 10.177.64.7:6789/0 >>
>> 10.177.64.9:6789/0 pipe(0x14766c80 sd=24 :6789 s=0 pgs=0 cs=0
>> l=0).accept connect_seq 112 vs existing 112 state connecting
>> 2013-09-13 00:11:14.557398 7fd638c2c700  0 -- 10.177.64.7:6789/0 >>
>> 10.177.64.6:6789/0 pipe(0x14766a00 sd=23 :6789 s=0 pgs=0 cs=0
>> l=0).accept connect_seq 126 vs existing 126 state connecting
>> 2013-09-13 00:11:16.467636 7fd631ebf700  0 -- 10.177.64.7:6789/0 >>
>> 10.177.64.8:6789/0 pipe(0x7038c80 sd=20 :6789 s=0 pgs=0 cs=0
>> l=0).accept connect_seq 122 vs existing 122 state connecting
>> 2013-09-13 00:11:21.553559 7fd63ac3e700  0 log [INF] : mon.4 calling
>> new monitor election
>>
>>
>> --
>> Dominik
>>
>> 2013/9/13 Joao Eduardo Luis <joao.luis@xxxxxxxxxxx>:
>> > On 09/13/2013 03:38 AM, Sage Weil wrote:
>> >>
>> >> On Thu, 12 Sep 2013, Dominik Mostowiec wrote:
>> >>>
>> >>> Hi,
>> >>> Today i have some issues with ceph cluster.
>> >>> After new mon election many osd has been marked failed.
>> >>> Some time later osd boot and i think recover because meny slow request
>> >>> appear.
>> >>> Cluster come back after about 20minutes.
>> >>
>> >>
>> >> Was there some other event that triggered the mon election?  There's
>> >> not
>> >> much here to go on except that several elections were called and by
>> >> different monitors, which suggests something was not quite right.
>> >
>> >
>> > My best guess would be clock skews, as they tend to be annoying like
>> > that.
>> >
>> > Setting 'debug mon = 10' on the monitors should provide more insight
>> > though.
>> >
>> >   -Joao
>> >
>> >>
>> >> sage
>> >>
>> >>
>> >>>
>> >>> cluster:
>> >>> ceph version 0.56.6
>> >>> 6 servers x 26 osd
>> >>>
>> >>> 2013-09-12 07:11:40.920384 mon.1 10.177.64.5:6789/0 353 : [INF] mon.3
>> >>> calling new monitor election
>> >>> 2013-09-12 07:12:40.992532 mon.3 10.177.64.7:6789/0 364 : [INF] mon.4
>> >>> calling new monitor election
>> >>> 2013-09-12 07:12:41.024954 mon.4 10.177.64.8:6789/0 360 : [INF] mon.2
>> >>> calling new monitor election
>> >>> 2013-09-12 07:13:02.782203 mon.2 10.177.64.6:6789/0 336 : [INF] mon.1
>> >>> calling new monitor election
>> >>> 2013-09-12 07:13:02.783778 mon.3 10.177.64.7:6789/0 366 : [INF] mon.4
>> >>> calling new monitor election
>> >>> 2013-09-12 07:13:10.852842 mon.3 10.177.64.7:6789/0 367 : [INF] mon.4
>> >>> calling new monitor election
>> >>> 2013-09-12 16:17:09.484277 mon.4 10.177.64.8:6789/0 363 : [INF] mon.2
>> >>> calling new monitor election
>> >>> 2013-09-12 16:17:09.497337 mon.3 10.177.64.7:6789/0 368 : [INF] mon.4
>> >>> calling new monitor election
>> >>> 2013-09-12 16:17:09.523787 mon.0 10.177.64.4:6789/0 4369021 : [INF]
>> >>> mon.0 calling new monitor election
>> >>> 2013-09-12 16:17:14.525282 mon.0 10.177.64.4:6789/0 4369022 : [INF]
>> >>> mon.0@0 won leader election with quorum 0,1,2,3,4
>> >>> ...
>> >>> 2013-09-12 16:17:14.689555 mon.0 10.177.64.4:6789/0 4369027 : [DBG]
>> >>> osd.130 10.177.64.9:6801/1401 reported failed by osd.121
>> >>> 10.177.64.7:6909/29496
>> >>> 2013-09-12 16:17:14.689584 mon.0 10.177.64.4:6789/0 4369028 : [DBG]
>> >>> osd.131 10.177.64.9:6810/2435 reported failed by osd.121
>> >>> 10.177.64.7:6909/29496
>> >>> 2013-09-12 16:17:14.689600 mon.0 10.177.64.4:6789/0 4369029 : [DBG]
>> >>> osd.132 10.177.64.9:6846/2885 reported failed by osd.121
>> >>> 10.177.64.7:6909/29496
>> >>> 2013-09-12 16:17:14.689615 mon.0 10.177.64.4:6789/0 4369030 : [DBG]
>> >>> osd.134 10.177.64.9:6855/3223 reported failed by osd.121
>> >>> 10.177.64.7:6909/29496
>> >>> 2013-09-12 16:17:14.689630 mon.0 10.177.64.4:6789/0 4369031 : [DBG]
>> >>> osd.136 10.177.64.9:6865/3559 reported failed by osd.121
>> >>> 10.177.64.7:6909/29496
>> >>--
>> Pozdrawiam
>> Dominik
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com



-- 
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux