Re: Luminous 12.1.1 upgrade mgr woes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 18 Jul 2017, Joao Eduardo Luis wrote:
> On 07/18/2017 01:20 PM, John Spray wrote:
> > On Tue, Jul 18, 2017 at 1:17 PM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
> > > On 07/18/2017 12:32 PM, John Spray wrote:
> > > > 
> > > > On Tue, Jul 18, 2017 at 10:03 AM, Mark Kirkwood
> > > > <mark.kirkwood@xxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Just had a go at this - 12.1.1 from a freshly deployed Jewel (10.2.9)
> > > > > on
> > > > > Ubuntu 16.04, following
> > > > > 
> > > > > http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken.
> > > > > 
> > > > > So it all worked ok *except* for the the mgr deploy, this hang at the
> > > > > key/caps modification stage (see attached). Now I managed to work
> > > > > around
> > > > > it:
> > > > > 
> > > > > - switch cephx to none in ceph.conf
> > > > > 
> > > > > - restart mon
> > > > > 
> > > > > - redeploy mgr
> > > > 
> > > > 
> > > > Hmm, I suspect the issue is with the bootstrap-mgr keyring.  I notice
> > > > that when trying a "mgr create" on an upgraded cluster, ceph-deploy is
> > > > prompting me to do a "gatherkeys", at which point it generates the
> > > > keyring.  However, the bootstrap-mgr identity that I have inside the
> > > > mon is weird, its key is AAAAAAAAAAAAAAAA.
> > > > 
> > > > Even after I've got the bootstrap-mgr keyring (whose AAA... key
> > > > matches the weird one that the mon has), I get EINVAL connecting, and
> > > > the mon is logging "error when trying to handle auth request, probably
> > > > malformed request".
> > > > 
> > > > So yeah, something's pretty broken here!
> > > 
> > > 
> > > I was having that when working on `osd new`, I think, but IIRC I managed
> > > to
> > > fix the bug.
> > > 
> > > This may be somehow related to the refactoring I did on AuthMonitor
> > > though.
> > > 
> > > Is this just a matter of running a 'mgr create' on an upgraded cluster? If
> > > so, I'll try reproducing this in the afternoon and see if I can figure out
> > > what went wrong.
> > 
> > Pretty much -- my cluster was a bit different though because it had
> > been kraken, so the mon nodes already had mgrs on them.  I was running
> > "mgr create" one one of the nodes that had never had a mgr or monitor
> > on it.
> 
> Looks like the problem is due to the auth entity not having a key at all when
> it's added during upgrade.
> 
> PR https://github.com/ceph/ceph/pull/16395 fixes it.

That fix looks right to me.  Were you able to reproduce the original 
issue, and/or did you test with the fix?

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux