Re: monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 21 Sep 2017 16:31:26 +0200

In my case it was syncing, and was syncing slowly (hour or so?). You 
should see this in the log file. I wanted to report this, because my 
store.db is only 200MB, and I guess you want your monitors up and 
running quickly.

I also noticed that when the 3rd monitor left the quorum, ceph -s 
command was slow timing out. Probably trying to connect to the 3rd 
monitor, but why? When this monitor is not in quorum.

-----Original Message-----
From: Sean Purdy [mailto:s.purdy@xxxxxxxxxxxxxxxx] 
Sent: donderdag 21 september 2017 12:02
To: Gregory Farnum
Cc: ceph-users
Subject: Re:  monitor takes long time to join quorum: 
STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

On Wed, 20 Sep 2017, Gregory Farnum said:
> That definitely sounds like a time sync issue. Are you *sure* they 
> matched each other?

NTP looked OK at the time.  But see below.

> Is it reproducible on restart?

Today I did a straight reboot - and it was fine, no issues.

The issue occurs after the machine is off for a number of hours, or has 
been worked on in the BIOS for a number of hours and then booted.  And 
then perhaps waited at the disk decrypt key prompt.

So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
machines)

Logs show a time change a few seconds after boot.  After boot it's 
running NTP and within that 45 minute period the NTP state looks the 
same as the other nodes in the (small) cluster.

How much drift is allowed between monitors?

Logs say:

Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24) 
...
Sep 20 09:46:44 store03 systemd[1]: Time has been changed Sep 20 
09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
0xdd6ca972.c694801d does not match aorg 0000000000.00000000 from 
server@172.16.0.16 xmt 0xdd6ca974.0c5c18f

So system time was changed about 6 seconds after disks were 
unlocked/boot proceeded.  But there was still 45 minutes of monitor 
messages after that.  Surely the time should have converged sooner than 
45 minutes?

NTP from today, post-problem.  But ntpq at the time of the problem 
looked just as OK:

store01:~$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 47 ms

store02$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ sudo ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  
jitter
========================================================================
======
+172.16.0.16     85.91.1.164      3 u  561 1024  377    0.287    0.554   
0.914
+172.16.0.18     94.125.129.7     3 u  411 1024  377    0.388   -0.331   
0.139
*172.16.0.19     158.43.128.33    2 u  289 1024  377    0.282   -0.005   
0.103

Sean

> On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx> 
wrote:
> 
> >
> > Hi,
> >
> >
> > Luminous 12.2.0
> >
> > Three node cluster, 18 OSD, debian stretch.
> >
> >
> > One node is down for maintenance for several hours.  When bringing 
> > it back up, OSDs rejoin after 5 minutes, but health is still 
> > warning.  monitor has not joined quorum after 40 minutes and logs 
> > show BADAUTHORIZER message every time the monitor tries to connect 
to the leader.
> >
> > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > l=0).handle_connect_reply connect got BADAUTHORIZER
> >
> > Then after ~45 minutes monitor *does* join quorum.
> >
> > I'm presuming this isn't normal behaviour?  Or if it is, let me know 

> > and I won't worry.
> >
> > All three nodes are using ntp and look OK timewise.
> >
> >
> > ceph-mon log:
> >
> > (.43 is leader, .45 is rebooted node, .44 is other live node in 
> > quorum)
> >
> > Boot:
> >
> > 2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0
> > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> > (unknown), pid 2243
> >
> > 2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > l=0).handle_connect_msg accept connect_seq 3 vs existing csq=0 
> > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.824723 
> > 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 
> > conn(0x56007244d000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH 
> > pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 
> > 3, 0x5600722c0000.cseq = 0), sending RESETSESSION 2017-09-20 
> > 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 
> > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.828053 
> > 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 
> > conn(0x5600722c0000 :-1 
> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 
> > l=0).process missed message?  skipped from seq 0 to 552717734
> >
> > 2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
> > 172.16.0.43:6789/0 conn(0x5600720fe800 :-1 
> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=49261 cs=1 
> > l=0).process missed message?  skipped from seq 0 to 1151972199 
> > 2017-09-20 09:46:05.581097 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > l=0).handle_connect_reply connect got BADAUTHORIZER 2017-09-20 
> > 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > l=0).handle_connect_reply connect got BADAUTHORIZER ...
> > [message repeats for 45 minutes]
> > ...
> > 2017-09-20 10:23:38.818767 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > l=0).handle_connect_reply connect  got BADAUTHORIZER
> >
> >
> > At this point, "ceph mon stat" says .45/store03 not in quorum:
> >
> > e5: 3 mons at {store01=
> > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67
> > 89/0}, election epoch 376, leader 0 store01, quorum 0,1 
> > store01,store02
> >
> >
> > Then suddenly a valid connection is made and sync happens:
> >
> > 2017-09-20 10:23:43.041009 7f49e5b2f700  1 
> > mon.store03@2(synchronizing).mds
> > e1 Unable to load 'last_metadata'
> > 2017-09-20 10:23:43.041967 7f49e5b2f700  1 
> > mon.store03@2(synchronizing).osd
> > e2381 e2381: 18 total, 13 up, 14 in
> > ...
> > 2017-09-20 10:23:43.045961 7f49e5b2f700  1 
> > mon.store03@2(synchronizing).osd
> > e2393 e2393: 18 total, 15 up, 15 in
> > ...
> > 2017-09-20 10:23:43.049255 7f49e5b2f700  1 
> > mon.store03@2(synchronizing).osd
> > e2406 e2406: 18 total, 18 up, 18 in
> > ...
> > 2017-09-20 10:23:43.054828 7f49e5b2f700  0 log_channel(cluster) log 
> > [INF]
> > : mon.store03 calling new monitor election 2017-09-20 
> > 10:23:43.054901 7f49e5b2f700  1 mon.store03@2(electing).elector(372)
> > init, last seen epoch 372
> >
> >
> > Now "ceph mon stat" says:
> >
> > e5: 3 mons at {store01=
> > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67
> > 89/0}, election epoch 378, leader 0 store01, quorum 0,1,2 
> > store01,store02,store03
> >
> > and everything's happy.
> >
> >
> > What should I look for/fix?  It's a fairly vanilla system.
> >
> >
> > Thanks in advance,
> >
> > Sean Purdy
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com