Re: [External Email] Overlapping Roots - How to Fix?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think it would suffice to change rule 0 to use a device class as well, as you already mentioned yourself. Do you have pools that use that rule? If not, the change wouldn’t even have any impact.

Zitat von Dave Hall <kdhall@xxxxxxxxxxxxxx>:

Oddly, the Nautilus cluster that I'm gradually decommissioning seems to
have the same shadow root pattern in its crush map.  I don't know if that
really means anything, but at least I know it's not something I did
differently when I set up the new Reef cluster.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx



On Fri, Sep 20, 2024 at 12:48 PM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:

Stefan, Anthony,

Anthony's sequence of commands to reclassify the root failed with errors.
so I have tried to look a little deeper.

I can now see the extra root via 'ceph osd crush tree --show-shadow'.
Looking at the decompiled crush tree, I can also see the extra root:

root default {
        id -1           # do not change unnecessarily

*        id -2 class hdd         # do not change unnecessarily*        #
weight 361.90518
        alg straw2
        hash 0  # rjenkins1
        item ceph00 weight 90.51434
        item ceph01 weight 90.29265
        item ceph09 weight 90.80554
        item ceph02 weight 90.29265
}


Based on the hints given in the link provided by Stefan, it would appear
that the correct solution might be to get rid of 'id -2' and change id -1
to class hdd,

root default {

*        id -1 class hdd         # do not change unnecessarily*        #
weight 361.90518
        alg straw2
        hash 0  # rjenkins1
        item ceph00 weight 90.51434
        item ceph01 weight 90.29265
        item ceph09 weight 90.80554
        item ceph02 weight 90.29265
}


but I'm no expert and anxious about losing data.

The rest of the rules in my crush map are:

# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule block-1 {
id 1
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type osd
step emit
}
rule default.rgw.buckets.data {
id 2
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type osd
step emit
}
rule ceph-block {
id 3
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type osd
step emit
}
rule replicated-hdd {
id 4
type replicated
step take default class hdd
step choose firstn 0 type osd
step emit
}

# end crush map


Of these, the last - id 4 - is one that I added while trying to figure
this out.  What this tells me is that the 'take' step in rule id 0 should
probably change to 'step take default class hdd'.

I also notice that each of my host stanzas (buckets) has what looks like
two roots.  For example

host ceph00 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 90.51434
alg straw2
hash 0 # rjenkins1
item osd.0 weight 11.35069
item osd.1 weight 11.35069
item osd.2 weight 11.35069
item osd.3 weight 11.35069
item osd.4 weight 11.27789
item osd.5 weight 11.27789
item osd.6 weight 11.27789
item osd.7 weight 11.27789
}


I assume I may need to clean this up somehow, or perhaps this is the real
problem.

Please advise.

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx

On Thu, Sep 19, 2024 at 3:56 AM Stefan Kooman <stefan@xxxxxx> wrote:

On 19-09-2024 05:10, Anthony D'Atri wrote:
>
>
>>
>> Anthony,
>>
>> So it sounds like I need to make a new crush rule for replicated pools
that specifies default-hdd and the device class?  (Or should I go the other
way around?  I think I'd rather change the replicated pools even though
there's more of them.)
>
> I think it would be best to edit the CRUSH rules in-situ so that each
specifies the device class, that way if you do get different media in the
future, you'll be ready.  Rather than messing around with new rules and
modifying pools, this is arguably one of the few times when one would
decompile, edit, recompile, and inject the CRUSH map in toto.
>
> I haven't tried this myself, but maybe something like the below, to
avoid the PITA and potential for error of edting the decompiled text file
by hand.
>
>
> ceph osd getcrushmap -o original.crush
> crushtool -d original.crush -o original.txt
> crushtool -i original.crush --reclassify --reclassify-root default hdd
--set-subtree-class default hdd -o adjusted.crush
> crushtool -d adjusted.crush -o adjusted.txt
> crushtool -i original.crush --compare adjusted.crush
> ceph osd setcrushmap -i adjusted.crush

This might be of use as well (if a lot of data would move):
https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/

Gr. Stefan


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux