On 07/18/2017 04:08 PM, Sage Weil wrote:
On Tue, 18 Jul 2017, Joao Eduardo Luis wrote:
On 07/18/2017 01:20 PM, John Spray wrote:
On Tue, Jul 18, 2017 at 1:17 PM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
On 07/18/2017 12:32 PM, John Spray wrote:
On Tue, Jul 18, 2017 at 10:03 AM, Mark Kirkwood
<mark.kirkwood@xxxxxxxxxxxxxxx> wrote:
Hi,
Just had a go at this - 12.1.1 from a freshly deployed Jewel (10.2.9)
on
Ubuntu 16.04, following
http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken.
So it all worked ok *except* for the the mgr deploy, this hang at the
key/caps modification stage (see attached). Now I managed to work
around
it:
- switch cephx to none in ceph.conf
- restart mon
- redeploy mgr
Hmm, I suspect the issue is with the bootstrap-mgr keyring. I notice
that when trying a "mgr create" on an upgraded cluster, ceph-deploy is
prompting me to do a "gatherkeys", at which point it generates the
keyring. However, the bootstrap-mgr identity that I have inside the
mon is weird, its key is AAAAAAAAAAAAAAAA.
Even after I've got the bootstrap-mgr keyring (whose AAA... key
matches the weird one that the mon has), I get EINVAL connecting, and
the mon is logging "error when trying to handle auth request, probably
malformed request".
So yeah, something's pretty broken here!
I was having that when working on `osd new`, I think, but IIRC I managed
to
fix the bug.
This may be somehow related to the refactoring I did on AuthMonitor
though.
Is this just a matter of running a 'mgr create' on an upgraded cluster? If
so, I'll try reproducing this in the afternoon and see if I can figure out
what went wrong.
Pretty much -- my cluster was a bit different though because it had
been kraken, so the mon nodes already had mgrs on them. I was running
"mgr create" one one of the nodes that had never had a mgr or monitor
on it.
Looks like the problem is due to the auth entity not having a key at all when
it's added during upgrade.
PR https://github.com/ceph/ceph/pull/16395 fixes it.
That fix looks right to me. Were you able to reproduce the original
issue, and/or did you test with the fix?
Yes to both.
Reproducing is trivial, coming from kraken:
- have a vstart kraken cluster
- upgrade monitors to luminous
- see a 'client.bootstrap-mgr' entry popping up on 'auth list' with a
key something like 'AAAAAAAA'
After the fix, the key is a proper cephx key.
-Joao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html