Re: BADAUTHORIZER in Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This OSD also appears on teh accepting end of things, and probably 
has newer keys that the OSD connecting (tho it' shard to tell 
because teh debug level isn't turned up).

The goal is to find an osd still throwing BADAUTHORIZER messages and then 
turn of debug_auth=20 and debug_monc=20.  Specifically I'm looking for the 
output from MonClient::_check_auth_rotating(), which should tell us 
whether the OSD thinks its rotating keys are up to date, or whether it is 
having trouble renewing them, or what.  It's weird that some OSDs are 
using hours-old keys to try to authenticate.  :/

sage


On Wed, 3 Apr 2019, Shawn Edwards wrote:

> ceph-post-file: 60e07a0c-ee5b-4174-9f51-fa091d5662dc
> 
> On Wed, Apr 3, 2019 at 5:30 PM Shawn Edwards <lesser.evil@xxxxxxxxx> wrote:
> 
> > According to ceph versions, all bits are running 14.2.0
> >
> > I have restarted all of the OSD at least twice and am still getting the
> > same error.
> >
> > I'll send a log file with confirmed interesting bad behavior shortly
> >
> > On Wed, Apr 3, 2019, 17:17 Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 --1- v1:10.36.9.46:6813/5003637
> >> >> v1:10.36.9.28:6809/8224 conn(0xf6a6000 0x30a02000 :6813
> >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
> >> authorizor_protocol 2 len 174
> >> 2019-04-03 15:04:01.986 7ffae5778700 20 AuthRegistry(0xcd64a40)
> >> get_handler peer_type 4 method 2 cluster_methods [2] service_methods [2]
> >> client_methods [2]
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 cephx: verify_authorizer
> >> decrypted service osd secret_id=41686
> >> 2019-04-03 15:04:01.986 7ffae5778700  0 auth: could not find
> >> secret_id=41686
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth: dump_rotating:
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth:  id 41691 ... expires
> >> 2019-04-03 14:43:07.042860
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth:  id 41692 ... expires
> >> 2019-04-03 15:43:09.895511
> >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth:  id 41693 ... expires
> >> 2019-04-03 16:43:09.895511
> >> 2019-04-03 15:04:01.986 7ffae5778700  0 cephx: verify_authorizer could
> >> not get service secret for service osd secret_id=41686
> >> 2019-04-03 15:04:01.986 7ffae5778700  0 --1- v1:10.36.9.46:6813/5003637
> >> >> v1:10.36.9.28:6809/8224 conn(0xf6a6000 0x30a02000 :6813
> >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2:
> >> got bad authorizer, auth_reply_len=0
> >>
> >> For some reason this OSD has much newer rotating keys than the
> >> connecting OSD.  But earlier in the day, this osd was the one
> >> getting BADAUTHORIZER, so maybe that shifted.  Can you find an OSD where
> >> you still see BADAUTHORIZER appearing in the log?
> >>
> >> My guess is that if you restart the OSDs, they'll get fresh rotating keys
> >> and things will be fine.  But that doesn't explain why they're not
> >> renewing on their own right now.. that I'm not so sure about.
> >>
> >> Are your mons all running nautilus?  Does 'ceph versions' show everything
> >> has upgraded?
> >>
> >> sage
> >>
> >>
> >> On Wed, 3 Apr 2019, Shawn Edwards wrote:
> >>
> >> > File uploaded: f1a2bfb3-92b4-495c-8706-f99cb228efc7
> >> >
> >> > On Wed, Apr 3, 2019 at 4:57 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >
> >> > > Hmm, that doesn't help.
> >> > >
> >> > > Can you set
> >> > >
> >> > >  ceph config set osd debug_ms 20
> >> > >  ceph config set osd debug_auth 20
> >> > >  ceph config set osd debug_monc 20
> >> > >
> >> > > for a few minutes and ceph-post-file the osd logs?  (Or send a private
> >> > > email with a link or something.)
> >> > >
> >> > > Thanks!
> >> > > sage
> >> > >
> >> > >
> >> > > On Wed, 3 Apr 2019, Shawn Edwards wrote:
> >> > >
> >> > > > No strange auth config:
> >> > > >
> >> > > > root@tyr-ceph-mon0:~# ceph config dump | grep -E '(auth|cephx)'
> >> > > > global        advanced auth_client_required               cephx
> >> > > >                                             *
> >> > > > global        advanced auth_cluster_required              cephx
> >> > > >                                             *
> >> > > > global        advanced auth_service_required              cephx
> >> > > >                                             *
> >> > > >
> >> > > > All boxes are using 'minimal' ceph.conf files and centralized
> >> config.
> >> > > >
> >> > > > If you need the full config, it's here:
> >> > > > https://gist.github.com/lesserevil/3b82d37e517f4561ce53c81629717aae
> >> > > >
> >> > > > On Wed, Apr 3, 2019 at 4:07 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> > > >
> >> > > > > On Wed, 3 Apr 2019, Shawn Edwards wrote:
> >> > > > > > Recent nautilus upgrade from mimic.  No issues on mimic.
> >> > > > > >
> >> > > > > > Now getting this or similar in all osd logs, there is very
> >> little osd
> >> > > > > > communicatoin, and most of the PG are either 'down' or
> >> 'unknown',
> >> > > even
> >> > > > > > though I can see the data on the filestores.
> >> > > > > >
> >> > > > > > 2019-04-03 13:47:55.280 7f13346e3700  0 --1- [v2:
> >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1:
> >> > > 10.36.9.37:6821/8825
> >> > > > > > conn(0xa7132000 0xa6b28000 :-1 s=CONNECTING_SEND_CONNECT_MSG
> >> pgs=0
> >> > > cs=0
> >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER
> >> > > > > > 2019-04-03 13:47:55.296 7f1333ee2700  0 --1- [v2:
> >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1:
> >> > > > > 10.36.9.37:6841/11204
> >> > > > > > conn(0xa9826d00 0xa9b78000 :-1 s=CONNECTING_SEND_CONNECT_MSG
> >> pgs=0
> >> > > cs=0
> >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER
> >> > > > > > 2019-04-03 13:47:55.340 7f13346e3700  0 --1- [v2:
> >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1:
> >> > > 10.36.9.37:6829/8425
> >> > > > > > conn(0xa7997180 0xaeb22800 :-1 s=CONNECTING_SEND_CONNECT_MSG
> >> pgs=0
> >> > > cs=0
> >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER
> >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700  0 auth: could not find
> >> > > > > secret_id=41687
> >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700  0 cephx: verify_authorizer
> >> > > could
> >> > > > > not
> >> > > > > > get service secret for service osd secret_id=41687
> >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700  0 --1- [v2:
> >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1:
> >> > > > > 10.36.9.48:6805/49547
> >> > > > > > conn(0xe02f24480 0xe088cb800 :6803
> >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH
> >> > > > > pgs=0
> >> > > > > > cs=0 l=0).handle_connect_message_2: got bad authorizer,
> >> > > auth_reply_len=0
> >> > > > > >
> >> > > > > > Thoughts?  I have confirmed that all ceph boxes have good time
> >> sync.
> >> > > > >
> >> > > > > Do you have any non-default auth-related settings in ceph.conf?
> >> > > > >
> >> > > > > sage
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >  Shawn Edwards
> >> > > >  Beware programmers with screwdrivers.  They tend to spill them on
> >> their
> >> > > > keyboards.
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> >  Shawn Edwards
> >> >  Beware programmers with screwdrivers.  They tend to spill them on their
> >> > keyboards.
> >> >
> >>
> >
> 
> -- 
>  Shawn Edwards
>  Beware programmers with screwdrivers.  They tend to spill them on their
> keyboards.
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux