Re: monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx> · Thu, 21 Sep 2017 15:43:27 +0100

On Thu, 21 Sep 2017, Marc Roos said:
>  
> 
> In my case it was syncing, and was syncing slowly (hour or so?). You 
> should see this in the log file. I wanted to report this, because my 
> store.db is only 200MB, and I guess you want your monitors up and 
> running quickly.

Well I wondered about that, but if it can't talk to the monitor quorum leader, it's not going to start copying data.

And no new files had been added to this test cluster.

> I also noticed that when the 3rd monitor left the quorum, ceph -s 
> command was slow timing out. Probably trying to connect to the 3rd 
> monitor, but why? When this monitor is not in quorum.

There's a setting for client timeouts.  I forget where.

Sean

> -----Original Message-----
> From: Sean Purdy [mailto:s.purdy@xxxxxxxxxxxxxxxx] 
> Sent: donderdag 21 september 2017 12:02
> To: Gregory Farnum
> Cc: ceph-users
> Subject: Re:  monitor takes long time to join quorum: 
> STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER
> 
> On Wed, 20 Sep 2017, Gregory Farnum said:
> > That definitely sounds like a time sync issue. Are you *sure* they 
> > matched each other?
> 
> NTP looked OK at the time.  But see below.
> 
> 
> > Is it reproducible on restart?
> 
> Today I did a straight reboot - and it was fine, no issues.
> 
> 
> The issue occurs after the machine is off for a number of hours, or has 
> been worked on in the BIOS for a number of hours and then booted.  And 
> then perhaps waited at the disk decrypt key prompt.
> 
> So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
> machines)
> 
> 
> Logs show a time change a few seconds after boot.  After boot it's 
> running NTP and within that 45 minute period the NTP state looks the 
> same as the other nodes in the (small) cluster.
> 
> How much drift is allowed between monitors?
> 
> 
> Logs say:
> 
> Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
> Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24) 
> ...
> Sep 20 09:46:44 store03 systemd[1]: Time has been changed Sep 20 
> 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
> 0xdd6ca972.c694801d does not match aorg 0000000000.00000000 from 
> server@172.16.0.16 xmt 0xdd6ca974.0c5c18f
> 
> So system time was changed about 6 seconds after disks were 
> unlocked/boot proceeded.  But there was still 45 minutes of monitor 
> messages after that.  Surely the time should have converged sooner than 
> 45 minutes?
> 
> 
> 
> NTP from today, post-problem.  But ntpq at the time of the problem 
> looked just as OK:
> 
> store01:~$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>    time correct to within 47 ms
> 
> store02$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>    time correct to within 63 ms
> 
> store03:~$ sudo ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>    time correct to within 63 ms
> 
> store03:~$ ntpq -p
>      remote           refid      st t when poll reach   delay   offset  
> jitter
> ========================================================================
> ======
> +172.16.0.16     85.91.1.164      3 u  561 1024  377    0.287    0.554   
> 0.914
> +172.16.0.18     94.125.129.7     3 u  411 1024  377    0.388   -0.331   
> 0.139
> *172.16.0.19     158.43.128.33    2 u  289 1024  377    0.282   -0.005   
> 0.103
> 
> 
> Sean
> 
>  
> > On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx> 
> wrote:
> > 
> > >
> > > Hi,
> > >
> > >
> > > Luminous 12.2.0
> > >
> > > Three node cluster, 18 OSD, debian stretch.
> > >
> > >
> > > One node is down for maintenance for several hours.  When bringing 
> > > it back up, OSDs rejoin after 5 minutes, but health is still 
> > > warning.  monitor has not joined quorum after 40 minutes and logs 
> > > show BADAUTHORIZER message every time the monitor tries to connect 
> to the leader.
> > >
> > > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect got BADAUTHORIZER
> > >
> > > Then after ~45 minutes monitor *does* join quorum.
> > >
> > > I'm presuming this isn't normal behaviour?  Or if it is, let me know 
> 
> > > and I won't worry.
> > >
> > > All three nodes are using ntp and look OK timewise.
> > >
> > >
> > > ceph-mon log:
> > >
> > > (.43 is leader, .45 is rebooted node, .44 is other live node in 
> > > quorum)
> > >
> > > Boot:
> > >
> > > 2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0
> > > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> > > (unknown), pid 2243
> > >
> > > 2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 
> > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_msg accept connect_seq 3 vs existing csq=0 
> > > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.824723 
> > > 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 
> > > conn(0x56007244d000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH 
> > > pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 
> > > 3, 0x5600722c0000.cseq = 0), sending RESETSESSION 2017-09-20 
> > > 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 
> > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 
> > > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.828053 
> > > 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 
> > > conn(0x5600722c0000 :-1 
> > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 
> > > l=0).process missed message?  skipped from seq 0 to 552717734
> > >
> > > 2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > > 172.16.0.43:6789/0 conn(0x5600720fe800 :-1 
> > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=49261 cs=1 
> > > l=0).process missed message?  skipped from seq 0 to 1151972199 
> > > 2017-09-20 09:46:05.581097 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect got BADAUTHORIZER 2017-09-20 
> > > 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect got BADAUTHORIZER ...
> > > [message repeats for 45 minutes]
> > > ...
> > > 2017-09-20 10:23:38.818767 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect  got BADAUTHORIZER
> > >
> > >
> > > At this point, "ceph mon stat" says .45/store03 not in quorum:
> > >
> > > e5: 3 mons at {store01=
> > > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67
> > > 89/0}, election epoch 376, leader 0 store01, quorum 0,1 
> > > store01,store02
> > >
> > >
> > > Then suddenly a valid connection is made and sync happens:
> > >
> > > 2017-09-20 10:23:43.041009 7f49e5b2f700  1 
> > > mon.store03@2(synchronizing).mds
> > > e1 Unable to load 'last_metadata'
> > > 2017-09-20 10:23:43.041967 7f49e5b2f700  1 
> > > mon.store03@2(synchronizing).osd
> > > e2381 e2381: 18 total, 13 up, 14 in
> > > ...
> > > 2017-09-20 10:23:43.045961 7f49e5b2f700  1 
> > > mon.store03@2(synchronizing).osd
> > > e2393 e2393: 18 total, 15 up, 15 in
> > > ...
> > > 2017-09-20 10:23:43.049255 7f49e5b2f700  1 
> > > mon.store03@2(synchronizing).osd
> > > e2406 e2406: 18 total, 18 up, 18 in
> > > ...
> > > 2017-09-20 10:23:43.054828 7f49e5b2f700  0 log_channel(cluster) log 
> > > [INF]
> > > : mon.store03 calling new monitor election 2017-09-20 
> > > 10:23:43.054901 7f49e5b2f700  1 mon.store03@2(electing).elector(372)
> > > init, last seen epoch 372
> > >
> > >
> > > Now "ceph mon stat" says:
> > >
> > > e5: 3 mons at {store01=
> > > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67
> > > 89/0}, election epoch 378, leader 0 store01, quorum 0,1,2 
> > > store01,store02,store03
> > >
> > > and everything's happy.
> > >
> > >
> > > What should I look for/fix?  It's a fairly vanilla system.
> > >
> > >
> > > Thanks in advance,
> > >
> > > Sean Purdy
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com