Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

Paul Emmerich <paul.emmerich@xxxxxxxx> · Fri, 23 Aug 2019 13:34:02 +0200

Is this reproducible with crushtool?

ceph osd getcrushmap -o crushmap
crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
hostname-that-doesnt-exist-yet -o crushmap.modified

Does it still happen if the crushmap is decompiled and recompiled?
(crushtool -d and crushtool -c)

Replacing XX with the osd ID you tried to add. Posting your (binary)
crushmap would be helpful to debug this. (see crushtool -d for what
information this file contains)

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Aug 23, 2019 at 1:10 PM Florian Haas <florian@xxxxxxxxxxxxxx> wrote:
>
> Hi everyone,
>
> there are a couple of bug reports about this in Redmine but only one
> (unanswered) mailing list message[1] that I could find. So I figured I'd
> raise the issue here again and copy the original reporters of the bugs
> (they are BCC'd, because in case they are no longer subscribed it
> wouldn't be appropriate to share their email addresses with the list).
>
> This is about https://tracker.ceph.com/issues/40029, and
> https://tracker.ceph.com/issues/39978 (the latter of which was recently
> closed as a duplicate of the former).
>
> In short, it appears that at least in luminous and mimic (I haven't
> tried nautilus yet), it's possible to crash a mon when attempting to add
> a new OSD as it's trying to inject itself into the crush map under its
> host bucket, when that host bucket does not exist yet.
>
> What's worse is that when the OSD's "ceph osd new" process has thus
> crashed the leader mon, a new leader is elected and in case the "ceph
> osd new" process is still running on the OSD node, it will promptly
> connect to that mon, and kill it too. This then continues until
> sufficiently many mons have died for quorum to be lost.
>
> The recovery steps appear to involve
>
> - killing the "ceph osd new" process,
> - restarting mons until you regain quorum,
> - and then running "ceph osd purge" to drop the problematic OSD entry
> from the crushmap and osdmap.
>
> The issue can apparently be worked around by adding the host buckets to
> the crushmap manually before adding the new OSDs, but surely this isn't
> intended to be a prerequisite, at least not to the point of mons
> crashing otherwise?
>
> Also I am guessing that this is some weird corner case rooted in an
> unusual combination of contributing factors, because otherwise I am
> guessing more people would be bitten by this problem.
>
> Anyone able to share their thoughts on this one? Have more people run
> into this?
>
> Cheers,
> Florian
>
>
>
> [1]
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034880.html
> — interestingly I could find this message in the pipermail archive but
> none in the one that my MUA keeps for me. So perhaps that message wasn't
> delivered to all subscribers, which might be why it has gone unanswered.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx