To test things, I tried created a new mgr in case there was some weird corruption with the old key but I'm seeing the same behavior with the new mgr.
On Fri, Jan 4, 2019 at 11:03 AM Randall Smith <rbsmith@xxxxxxxxx> wrote:
The keys in the keyrings for the broken mgrs match what is shows in ceph auth list. The relevant entries are below so that you can see the caps.I am having problems with both mgr.6 and mgr.8. mgr.7 is the only mgr currently functioning.mgr.6key: [redacted]caps: [mds] allow *caps: [mgr] allow rcaps: [mon] allow profile mgrcaps: [osd] allow *mgr.7key: [redacted]caps: [mds] allow *caps: [mgr] allow rcaps: [mon] allow profile mgrcaps: [osd] allow *mgr.8key: [redacted]caps: [mds] allow *caps: [mon] allow profile mgrcaps: [osd] allow *I agree that an auth issue seems unlikely to have been triggered but I'm not sure what else it can be.On Fri, Jan 4, 2019 at 10:51 AM Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:I can't think of why the upgrade would have broken your keys, but have you verified that the mons still have the correct mgr keys configured? 'ceph auth ls' should list an mgr.<host> key for each mgr with a key matching the contents of /var/lib/ceph/mgr/<cluster>-<host>/keyring on the mgr host and some caps that should minimally include '[mon] allow profile mgr' and '[osd] allow *' I would think.
Again, it seems unlikely that this would have broken with the upgrade if it had been working previously, but if you're seeing auth errors it might be something to check out.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.
On Fri, 2019-01-04 at 07:26 -0700, Randall Smith wrote:Greetings,
I'm upgrading my cluster from luminous to mimic. I've upgraded my monitors and am attempting to upgrade the mgrs. Unfortunately, after an upgrade the mgr daemon exits immediately with error code 1.
I've tried running ceph-mgr in debug mode to try to see what's happening but the output (below) is a bit cryptic for me. It looks like authentication might be failing but it was working prior to the upgrade.
I do have "auth supported = cephx" in the global section of ceph.conf.
What do I need to do to fix this?
Thanks.
/usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup ceph -d --debug_ms 52019-01-04 07:01:38.457 7f808f83f700 2 Event(0x30c42c0 nevent=5000 time_id=1).set_owner idx=0 owner=1401901403317762019-01-04 07:01:38.457 7f808f03e700 2 Event(0x30c4500 nevent=5000 time_id=1).set_owner idx=1 owner=1401901319390722019-01-04 07:01:38.457 7f808e83d700 2 Event(0x30c4e00 nevent=5000 time_id=1).set_owner idx=2 owner=1401901235463682019-01-04 07:01:38.457 7f809dd5b380 1 Processor -- start2019-01-04 07:01:38.477 7f809dd5b380 1 -- - start start2019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.147:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6780 con 02019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.148:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6a00 con 02019-01-04 07:01:38.481 7f808e83d700 1 -- 192.168.253.148:0/1359135487 learned_addr learned my addr 192.168.253.148:0/13591354872019-01-04 07:01:38.481 7f808e83d700 2 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_a$ked_seq 0 vs out_seq 02019-01-04 07:01:38.481 7f808f03e700 2 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_a$ked_seq 0 vs out_seq 02019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 seq1 0x30c5440 mon_map magic: 0 v12019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 seq1 0x30c5680 mon_map magic: 0 v12019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 seq2 0x32a6780 auth_reply(proto 2 0 (0) Success) v12019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 seq2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v12019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0 (3034216899 0 0) 0x30c5440 con 0x332ce002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0 (3034216899 0 0) 0x30c5680 con 0x332d5002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3430158761 0 0) 0x32a6780 con 0x33$ce002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 --> 192.168.253.147:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x32a6f00 con 02019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3242503871 0 0) 0x32a6a00 con 0x33$d5002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 --> 192.168.253.148:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x32a6780 con 02019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 seq3 0x32a6f00 auth_reply(proto 2 -22 (22) Invalid argument) v12019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 3 ==== auth_reply(proto 2 -22 (22) Invalid argument) v1 ==== 24+0+0 (882932531 0 0) 0x32a6f$0 con 0x332ce002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1 l=1).mark_down2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1 l=1)._stop2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 seq3 0x32a6780 auth_reply(proto 2 -22 (22) Invalid argument) v12019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 3 ==== auth_reply(proto 2 -22 (22) Invalid argument) v1 ==== 24+0+0 (1359424806 0 0) 0x32a6$80 con 0x332d5002019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1 l=1).mark_down2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1 l=1)._stop
2019-01-04 07:01:38.481 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 shutdown_connections2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 shutdown_connections mark down 192.168.253.148:6789/0 0x332d5002019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 shutdown_connections mark down 192.168.253.147:6789/0 0x332ce002019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 shutdown_connections delete 0x332ce002019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 shutdown_connections delete 0x332d5002019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 shutdown_connections2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 wait complete.2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down2019-01-04 07:01:38.485 7f809dd5b380 2 -- 192.168.253.148:0/1359135487 >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 cs=0 l=0)._stopfailed to fetch mon config (--no-mon-config to skip)
_______________________________________________ceph-users mailing listceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com--
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com