Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/08/2019 22:14, Paul Emmerich wrote:
> On Fri, Aug 23, 2019 at 3:54 PM Florian Haas <florian@xxxxxxxxxxxxxx> wrote:
>>
>> On 23/08/2019 13:34, Paul Emmerich wrote:
>>> Is this reproducible with crushtool?
>>
>> Not for me.
>>
>>> ceph osd getcrushmap -o crushmap
>>> crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
>>> hostname-that-doesnt-exist-yet -o crushmap.modified
>>> Replacing XX with the osd ID you tried to add.
>>
>> Just checking whether this was intentional. As the issue pops up when
>> adding an new OSD *on* a new host, not moving an existing OSD *to* a new
>> host, I would have used --add-item here. Is there a specific reason why
>> you're suggesting to test with --update-item?
> 
> yes, update should map to create or move which it should use internally
> 
>>
>> At any rate, I tried with multiple different combinations (this is on a
>> 12.2.12 test cluster; I can't test this in production):
> 
> which also ran into this bug? The idea of using crushtool is to not
> crash your production cluster but just the local tool.

Ah, gotcha. I thought you wanted me to be able to at least do "ceph osd
setcrushmap" with the resulting crushmap, which would require a running
cluster.

So yes, doing this completely offline shows that you're definitely on to
something. I am able to crash crushtool with the original crushmap, and
what it appears to be falling over on is a choose_args map in there.

I've updated the bug report with this comment:
https://tracker.ceph.com/issues/40029#note-11

It would seem that there are two workarounds at this stage for
pre-Nautilus users with a choose_args map in their crushmap, and who for
some reason are unable to upgrade to Nautilus yet:

1. Add host buckets manually before adding new OSDs.
2. Drop any choose_args map from their crushmap.

As it happens I am not aware of any way to do #2 other than

- using getcrushmap,
- decompiling the crushmap,
- dropping the choose_args map from the textual representation of the
crushmap,
- recompiling, and then
- using setcrushmap.

Are you, by any chance?

Thanks again for your help!

Cheers,
Florian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux