Re: Luminous 12.1.1 upgrade mgr woes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/18/2017 01:20 PM, John Spray wrote:
On Tue, Jul 18, 2017 at 1:17 PM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
On 07/18/2017 12:32 PM, John Spray wrote:

On Tue, Jul 18, 2017 at 10:03 AM, Mark Kirkwood
<mark.kirkwood@xxxxxxxxxxxxxxx> wrote:

Hi,

Just had a go at this - 12.1.1 from a freshly deployed Jewel (10.2.9) on
Ubuntu 16.04, following

http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken.

So it all worked ok *except* for the the mgr deploy, this hang at the
key/caps modification stage (see attached). Now I managed to work around
it:

- switch cephx to none in ceph.conf

- restart mon

- redeploy mgr


Hmm, I suspect the issue is with the bootstrap-mgr keyring.  I notice
that when trying a "mgr create" on an upgraded cluster, ceph-deploy is
prompting me to do a "gatherkeys", at which point it generates the
keyring.  However, the bootstrap-mgr identity that I have inside the
mon is weird, its key is AAAAAAAAAAAAAAAA.

Even after I've got the bootstrap-mgr keyring (whose AAA... key
matches the weird one that the mon has), I get EINVAL connecting, and
the mon is logging "error when trying to handle auth request, probably
malformed request".

So yeah, something's pretty broken here!


I was having that when working on `osd new`, I think, but IIRC I managed to
fix the bug.

This may be somehow related to the refactoring I did on AuthMonitor though.

Is this just a matter of running a 'mgr create' on an upgraded cluster? If
so, I'll try reproducing this in the afternoon and see if I can figure out
what went wrong.

Pretty much -- my cluster was a bit different though because it had
been kraken, so the mon nodes already had mgrs on them.  I was running
"mgr create" one one of the nodes that had never had a mgr or monitor
on it.

Looks like the problem is due to the auth entity not having a key at all when it's added during upgrade.

PR https://github.com/ceph/ceph/pull/16395 fixes it.

  -Joao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux