Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 23, 2019 at 3:54 PM Florian Haas <florian@xxxxxxxxxxxxxx> wrote:
>
> On 23/08/2019 13:34, Paul Emmerich wrote:
> > Is this reproducible with crushtool?
>
> Not for me.
>
> > ceph osd getcrushmap -o crushmap
> > crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
> > hostname-that-doesnt-exist-yet -o crushmap.modified
> > Replacing XX with the osd ID you tried to add.
>
> Just checking whether this was intentional. As the issue pops up when
> adding an new OSD *on* a new host, not moving an existing OSD *to* a new
> host, I would have used --add-item here. Is there a specific reason why
> you're suggesting to test with --update-item?

yes, update should map to create or move which it should use internally

>
> At any rate, I tried with multiple different combinations (this is on a
> 12.2.12 test cluster; I can't test this in production):

which also ran into this bug? The idea of using crushtool is to not
crash your production cluster but just the local tool.


Paul

>
>
> 0. Get the current reference crushmap:
>
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF
> -1       0.05846 root default
> -5       0.01949     host daisy
>  0   hdd 0.01949         osd.0      up  1.00000 1.00000
> -7       0.01949     host eric
>  1   hdd 0.01949         osd.1      up  1.00000 1.00000
> -3       0.01949     host frank
>  2   hdd 0.01949         osd.2      up  1.00000 1.00000
> # ceph osd getcrushmap -o crushmap
> 11
>
>
> 1. "Update" a nonexistent OSD belonging to a nonexistent host (your
> suggestion):
>
> # crushtool -i crushmap --update-item 59 0.01949 osd.59 --loc host
> nonexistent -o crushmap-update-nonexistent-to-nonexistent
> # ceph osd setcrushmap -i crushmap-update-nonexistent-to-nonexistent
> 12
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF
> -9       0.01949 host nonexistent
> 59       0.01949     osd.59          DNE        0
> -1       0.05846 root default
> -5       0.01949     host daisy
>  0   hdd 0.01949         osd.0        up  1.00000 1.00000
> -7       0.01949     host eric
>  1   hdd 0.01949         osd.1        up  1.00000 1.00000
> -3       0.01949     host frank
>  2   hdd 0.01949         osd.2        up  1.00000 1.00000
> # ceph osd setcrushmap -i crushmap
> 13
>
>
> 2. Add a nonexistent OSD belonging to a nonexistent host (I think this
> is functionally identical):
>
> # crushtool -i crushmap --add-item 59 0.01949 osd.59 --loc host
> nonexistent -o crushmap-add-nonexistent-to-nonexistent
> # ceph osd setcrushmap -i crushmap-add-nonexistent-to-nonexistent
> 14
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF -9
> 0.01949 host nonexistent
> 59       0.01949     osd.59          DNE        0
> -1       0.05846 root default
> -5       0.01949     host daisy
>  0   hdd 0.01949         osd.0        up  1.00000 1.00000
> -7       0.01949     host eric
>  1   hdd 0.01949         osd.1        up  1.00000 1.00000
> -3       0.01949     host frank
>  2   hdd 0.01949         osd.2        up  1.00000 1.00000
> # ceph osd setcrushmap -i crushmap
> 15
>
>
> 3. Move an existing OSD to a nonexistent host:
>
> # crushtool -i crushmap --update-item 0 0.01949 osd.0 --loc host
> nonexistent -o crushmap-update-existing-to-nonexistent
> # ceph osd setcrushmap -i crushmap-update-existing-to-nonexistent
> 16
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF
> -9       0.01949 host nonexistent
>  0   hdd 0.01949     osd.0            up  1.00000 1.00000
> -1       0.03897 root default
> -5             0     host daisy
> -7       0.01949     host eric
>  1   hdd 0.01949         osd.1        up  1.00000 1.00000
> -3       0.01949     host frank
>  2   hdd 0.01949         osd.2        up  1.00000 1.00000
> # ceph osd setcrushmap -i crushmap
> 17
>
>
> None of these crashed any mon.
>
> However, there's this line in the bug report:
>
>    -19> 2019-08-22 10:08:11.897364 7f93797ab700  0
> mon.cc-ceph-osd11-fra1@0(leader).osd e302401 create-or-move crush item
> name 'osd.59' initial_weight 1.6374 at location
> {host=cc-ceph-osd26-fra1,root=default}
>
> So it's not trying to move the item to just a nonexistent host, but to a
> nonexistent host *in the default root*.
>
> So I retried the above commands with "--loc host nonexistent --loc root
> default".  No change other than everything showing up under default; no
> mon crash.
>
> And then I tried one more which was to *first* add just a new OSD under
> the default root, and *then* moving that OSD to a new, nonexistent host,
> also under the default root. Again, no mon crash.
>
> So I'm afraid I am unable to reproduce this with crushtool and setcrushmap.
>
>
> And I can't get my mons to crash with "ceph osd crush move", either:
>
> ceph osd crush move osd.59 host=nonexistent root=default
> moved item id 59 name 'osd.59' to location
> {host=nonexistent,root=default} in crush map
>
>
> Cheers,
> Florian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux