Thank you to everybody who has responded to my questions. At this point I think I am starting to understand. However, I am still trying to understand the potential for data loss. In particular: - In some ways it seems that as long as there is sufficient OSD capacity available the worst that can happen from a bad crush map is poor placement and poor performance. Is this correct? - crushtool --compare - if the result of this command shows no mismatches, can we say that the adjusted crush map is safe to apply? - If all of the 'inhibit flags' are turned on (no out, no down, no scrub/deep-scrub, no recover/rebalance/backfill, and perhaps pause) is it safe to apply an adjusted crush map? Is it safe to revert to the original crush map if things don't seem quite right? Thanks. -Dave -- Dave Hall Binghamton University kdhall@xxxxxxxxxxxxxx On Sat, Sep 21, 2024 at 4:01 AM Eugen Block <eblock@xxxxxx> wrote: > I think it would suffice to change rule 0 to use a device class as > well, as you already mentioned yourself. Do you have pools that use > that rule? If not, the change wouldn’t even have any impact. > > Zitat von Dave Hall <kdhall@xxxxxxxxxxxxxx>: > > > Oddly, the Nautilus cluster that I'm gradually decommissioning seems to > > have the same shadow root pattern in its crush map. I don't know if that > > really means anything, but at least I know it's not something I did > > differently when I set up the new Reef cluster. > > > > -Dave > > > > -- > > Dave Hall > > Binghamton University > > kdhall@xxxxxxxxxxxxxx > > > > > > > > On Fri, Sep 20, 2024 at 12:48 PM Dave Hall <kdhall@xxxxxxxxxxxxxx> > wrote: > > > >> Stefan, Anthony, > >> > >> Anthony's sequence of commands to reclassify the root failed with > errors. > >> so I have tried to look a little deeper. > >> > >> I can now see the extra root via 'ceph osd crush tree --show-shadow'. > >> Looking at the decompiled crush tree, I can also see the extra root: > >> > >> root default { > >> id -1 # do not change unnecessarily > >> > >> * id -2 class hdd # do not change unnecessarily* # > >> weight 361.90518 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item ceph00 weight 90.51434 > >> item ceph01 weight 90.29265 > >> item ceph09 weight 90.80554 > >> item ceph02 weight 90.29265 > >> } > >> > >> > >> Based on the hints given in the link provided by Stefan, it would appear > >> that the correct solution might be to get rid of 'id -2' and change id > -1 > >> to class hdd, > >> > >> root default { > >> > >> * id -1 class hdd # do not change unnecessarily* # > >> weight 361.90518 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item ceph00 weight 90.51434 > >> item ceph01 weight 90.29265 > >> item ceph09 weight 90.80554 > >> item ceph02 weight 90.29265 > >> } > >> > >> > >> but I'm no expert and anxious about losing data. > >> > >> The rest of the rules in my crush map are: > >> > >> # rules > >> rule replicated_rule { > >> id 0 > >> type replicated > >> step take default > >> step chooseleaf firstn 0 type host > >> step emit > >> } > >> rule block-1 { > >> id 1 > >> type erasure > >> step set_chooseleaf_tries 5 > >> step set_choose_tries 100 > >> step take default class hdd > >> step choose indep 0 type osd > >> step emit > >> } > >> rule default.rgw.buckets.data { > >> id 2 > >> type erasure > >> step set_chooseleaf_tries 5 > >> step set_choose_tries 100 > >> step take default class hdd > >> step choose indep 0 type osd > >> step emit > >> } > >> rule ceph-block { > >> id 3 > >> type erasure > >> step set_chooseleaf_tries 5 > >> step set_choose_tries 100 > >> step take default class hdd > >> step choose indep 0 type osd > >> step emit > >> } > >> rule replicated-hdd { > >> id 4 > >> type replicated > >> step take default class hdd > >> step choose firstn 0 type osd > >> step emit > >> } > >> > >> # end crush map > >> > >> > >> Of these, the last - id 4 - is one that I added while trying to figure > >> this out. What this tells me is that the 'take' step in rule id 0 > should > >> probably change to 'step take default class hdd'. > >> > >> I also notice that each of my host stanzas (buckets) has what looks like > >> two roots. For example > >> > >> host ceph00 { > >> id -3 # do not change unnecessarily > >> id -4 class hdd # do not change unnecessarily > >> # weight 90.51434 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item osd.0 weight 11.35069 > >> item osd.1 weight 11.35069 > >> item osd.2 weight 11.35069 > >> item osd.3 weight 11.35069 > >> item osd.4 weight 11.27789 > >> item osd.5 weight 11.27789 > >> item osd.6 weight 11.27789 > >> item osd.7 weight 11.27789 > >> } > >> > >> > >> I assume I may need to clean this up somehow, or perhaps this is the > real > >> problem. > >> > >> Please advise. > >> > >> Thanks. > >> > >> -Dave > >> > >> -- > >> Dave Hall > >> Binghamton University > >> kdhall@xxxxxxxxxxxxxx > >> > >> On Thu, Sep 19, 2024 at 3:56 AM Stefan Kooman <stefan@xxxxxx> wrote: > >> > >>> On 19-09-2024 05:10, Anthony D'Atri wrote: > >>> > > >>> > > >>> >> > >>> >> Anthony, > >>> >> > >>> >> So it sounds like I need to make a new crush rule for replicated > pools > >>> that specifies default-hdd and the device class? (Or should I go the > other > >>> way around? I think I'd rather change the replicated pools even though > >>> there's more of them.) > >>> > > >>> > I think it would be best to edit the CRUSH rules in-situ so that each > >>> specifies the device class, that way if you do get different media in > the > >>> future, you'll be ready. Rather than messing around with new rules and > >>> modifying pools, this is arguably one of the few times when one would > >>> decompile, edit, recompile, and inject the CRUSH map in toto. > >>> > > >>> > I haven't tried this myself, but maybe something like the below, to > >>> avoid the PITA and potential for error of edting the decompiled text > file > >>> by hand. > >>> > > >>> > > >>> > ceph osd getcrushmap -o original.crush > >>> > crushtool -d original.crush -o original.txt > >>> > crushtool -i original.crush --reclassify --reclassify-root default > hdd > >>> --set-subtree-class default hdd -o adjusted.crush > >>> > crushtool -d adjusted.crush -o adjusted.txt > >>> > crushtool -i original.crush --compare adjusted.crush > >>> > ceph osd setcrushmap -i adjusted.crush > >>> > >>> This might be of use as well (if a lot of data would move): > >>> https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/ > >>> > >>> Gr. Stefan > >>> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx