Re: [External Email] Overlapping Roots - How to Fix?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oddly, the Nautilus cluster that I'm gradually decommissioning seems to
have the same shadow root pattern in its crush map.  I don't know if that
really means anything, but at least I know it's not something I did
differently when I set up the new Reef cluster.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx



On Fri, Sep 20, 2024 at 12:48 PM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:

> Stefan, Anthony,
>
> Anthony's sequence of commands to reclassify the root failed with errors.
> so I have tried to look a little deeper.
>
> I can now see the extra root via 'ceph osd crush tree --show-shadow'.
> Looking at the decompiled crush tree, I can also see the extra root:
>
> root default {
>         id -1           # do not change unnecessarily
>
> *        id -2 class hdd         # do not change unnecessarily*        #
> weight 361.90518
>         alg straw2
>         hash 0  # rjenkins1
>         item ceph00 weight 90.51434
>         item ceph01 weight 90.29265
>         item ceph09 weight 90.80554
>         item ceph02 weight 90.29265
> }
>
>
> Based on the hints given in the link provided by Stefan, it would appear
> that the correct solution might be to get rid of 'id -2' and change id -1
> to class hdd,
>
> root default {
>
> *        id -1 class hdd         # do not change unnecessarily*        #
> weight 361.90518
>         alg straw2
>         hash 0  # rjenkins1
>         item ceph00 weight 90.51434
>         item ceph01 weight 90.29265
>         item ceph09 weight 90.80554
>         item ceph02 weight 90.29265
> }
>
>
> but I'm no expert and anxious about losing data.
>
> The rest of the rules in my crush map are:
>
> # rules
> rule replicated_rule {
> id 0
> type replicated
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule block-1 {
> id 1
> type erasure
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default class hdd
> step choose indep 0 type osd
> step emit
> }
> rule default.rgw.buckets.data {
> id 2
> type erasure
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default class hdd
> step choose indep 0 type osd
> step emit
> }
> rule ceph-block {
> id 3
> type erasure
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default class hdd
> step choose indep 0 type osd
> step emit
> }
> rule replicated-hdd {
> id 4
> type replicated
> step take default class hdd
> step choose firstn 0 type osd
> step emit
> }
>
> # end crush map
>
>
> Of these, the last - id 4 - is one that I added while trying to figure
> this out.  What this tells me is that the 'take' step in rule id 0 should
> probably change to 'step take default class hdd'.
>
> I also notice that each of my host stanzas (buckets) has what looks like
> two roots.  For example
>
> host ceph00 {
> id -3 # do not change unnecessarily
> id -4 class hdd # do not change unnecessarily
> # weight 90.51434
> alg straw2
> hash 0 # rjenkins1
> item osd.0 weight 11.35069
> item osd.1 weight 11.35069
> item osd.2 weight 11.35069
> item osd.3 weight 11.35069
> item osd.4 weight 11.27789
> item osd.5 weight 11.27789
> item osd.6 weight 11.27789
> item osd.7 weight 11.27789
> }
>
>
> I assume I may need to clean this up somehow, or perhaps this is the real
> problem.
>
> Please advise.
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx
>
> On Thu, Sep 19, 2024 at 3:56 AM Stefan Kooman <stefan@xxxxxx> wrote:
>
>> On 19-09-2024 05:10, Anthony D'Atri wrote:
>> >
>> >
>> >>
>> >> Anthony,
>> >>
>> >> So it sounds like I need to make a new crush rule for replicated pools
>> that specifies default-hdd and the device class?  (Or should I go the other
>> way around?  I think I'd rather change the replicated pools even though
>> there's more of them.)
>> >
>> > I think it would be best to edit the CRUSH rules in-situ so that each
>> specifies the device class, that way if you do get different media in the
>> future, you'll be ready.  Rather than messing around with new rules and
>> modifying pools, this is arguably one of the few times when one would
>> decompile, edit, recompile, and inject the CRUSH map in toto.
>> >
>> > I haven't tried this myself, but maybe something like the below, to
>> avoid the PITA and potential for error of edting the decompiled text file
>> by hand.
>> >
>> >
>> > ceph osd getcrushmap -o original.crush
>> > crushtool -d original.crush -o original.txt
>> > crushtool -i original.crush --reclassify --reclassify-root default hdd
>> --set-subtree-class default hdd -o adjusted.crush
>> > crushtool -d adjusted.crush -o adjusted.txt
>> > crushtool -i original.crush --compare adjusted.crush
>> > ceph osd setcrushmap -i adjusted.crush
>>
>> This might be of use as well (if a lot of data would move):
>> https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/
>>
>> Gr. Stefan
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux