On Tue, 18 Jul 2017, Joao Eduardo Luis wrote: > On 07/18/2017 01:20 PM, John Spray wrote: > > On Tue, Jul 18, 2017 at 1:17 PM, Joao Eduardo Luis <joao@xxxxxxx> wrote: > > > On 07/18/2017 12:32 PM, John Spray wrote: > > > > > > > > On Tue, Jul 18, 2017 at 10:03 AM, Mark Kirkwood > > > > <mark.kirkwood@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > Hi, > > > > > > > > > > Just had a go at this - 12.1.1 from a freshly deployed Jewel (10.2.9) > > > > > on > > > > > Ubuntu 16.04, following > > > > > > > > > > http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken. > > > > > > > > > > So it all worked ok *except* for the the mgr deploy, this hang at the > > > > > key/caps modification stage (see attached). Now I managed to work > > > > > around > > > > > it: > > > > > > > > > > - switch cephx to none in ceph.conf > > > > > > > > > > - restart mon > > > > > > > > > > - redeploy mgr > > > > > > > > > > > > Hmm, I suspect the issue is with the bootstrap-mgr keyring. I notice > > > > that when trying a "mgr create" on an upgraded cluster, ceph-deploy is > > > > prompting me to do a "gatherkeys", at which point it generates the > > > > keyring. However, the bootstrap-mgr identity that I have inside the > > > > mon is weird, its key is AAAAAAAAAAAAAAAA. > > > > > > > > Even after I've got the bootstrap-mgr keyring (whose AAA... key > > > > matches the weird one that the mon has), I get EINVAL connecting, and > > > > the mon is logging "error when trying to handle auth request, probably > > > > malformed request". > > > > > > > > So yeah, something's pretty broken here! > > > > > > > > > I was having that when working on `osd new`, I think, but IIRC I managed > > > to > > > fix the bug. > > > > > > This may be somehow related to the refactoring I did on AuthMonitor > > > though. > > > > > > Is this just a matter of running a 'mgr create' on an upgraded cluster? If > > > so, I'll try reproducing this in the afternoon and see if I can figure out > > > what went wrong. > > > > Pretty much -- my cluster was a bit different though because it had > > been kraken, so the mon nodes already had mgrs on them. I was running > > "mgr create" one one of the nodes that had never had a mgr or monitor > > on it. > > Looks like the problem is due to the auth entity not having a key at all when > it's added during upgrade. > > PR https://github.com/ceph/ceph/pull/16395 fixes it. That fix looks right to me. Were you able to reproduce the original issue, and/or did you test with the fix? Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html