Re: Overlapping Roots - How to Fix?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you to everybody who has responded to my questions.

At this point I think I am starting to understand.  However, I am still
trying to understand the potential for data loss.

In particular:

   - In some ways it seems that as long as there is sufficient OSD capacity
   available the worst that can happen from a bad crush map is poor placement
   and poor performance. Is this correct?
   - crushtool --compare - if the result of this command shows no
   mismatches, can we say that  the adjusted crush map is safe to apply?
   - If all of the 'inhibit flags' are turned on (no out, no down, no
   scrub/deep-scrub, no recover/rebalance/backfill, and perhaps pause) is it
   safe to apply an adjusted crush map?  Is it safe to revert to the original
   crush map if things don't seem quite right?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx

On Sat, Sep 21, 2024 at 4:01 AM Eugen Block <eblock@xxxxxx> wrote:

> I think it would suffice to change rule 0 to use a device class as
> well, as you already mentioned yourself. Do you have pools that use
> that rule? If not, the change wouldn’t even have any impact.
>
> Zitat von Dave Hall <kdhall@xxxxxxxxxxxxxx>:
>
> > Oddly, the Nautilus cluster that I'm gradually decommissioning seems to
> > have the same shadow root pattern in its crush map.  I don't know if that
> > really means anything, but at least I know it's not something I did
> > differently when I set up the new Reef cluster.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> >
> >
> >
> > On Fri, Sep 20, 2024 at 12:48 PM Dave Hall <kdhall@xxxxxxxxxxxxxx>
> wrote:
> >
> >> Stefan, Anthony,
> >>
> >> Anthony's sequence of commands to reclassify the root failed with
> errors.
> >> so I have tried to look a little deeper.
> >>
> >> I can now see the extra root via 'ceph osd crush tree --show-shadow'.
> >> Looking at the decompiled crush tree, I can also see the extra root:
> >>
> >> root default {
> >>         id -1           # do not change unnecessarily
> >>
> >> *        id -2 class hdd         # do not change unnecessarily*        #
> >> weight 361.90518
> >>         alg straw2
> >>         hash 0  # rjenkins1
> >>         item ceph00 weight 90.51434
> >>         item ceph01 weight 90.29265
> >>         item ceph09 weight 90.80554
> >>         item ceph02 weight 90.29265
> >> }
> >>
> >>
> >> Based on the hints given in the link provided by Stefan, it would appear
> >> that the correct solution might be to get rid of 'id -2' and change id
> -1
> >> to class hdd,
> >>
> >> root default {
> >>
> >> *        id -1 class hdd         # do not change unnecessarily*        #
> >> weight 361.90518
> >>         alg straw2
> >>         hash 0  # rjenkins1
> >>         item ceph00 weight 90.51434
> >>         item ceph01 weight 90.29265
> >>         item ceph09 weight 90.80554
> >>         item ceph02 weight 90.29265
> >> }
> >>
> >>
> >> but I'm no expert and anxious about losing data.
> >>
> >> The rest of the rules in my crush map are:
> >>
> >> # rules
> >> rule replicated_rule {
> >> id 0
> >> type replicated
> >> step take default
> >> step chooseleaf firstn 0 type host
> >> step emit
> >> }
> >> rule block-1 {
> >> id 1
> >> type erasure
> >> step set_chooseleaf_tries 5
> >> step set_choose_tries 100
> >> step take default class hdd
> >> step choose indep 0 type osd
> >> step emit
> >> }
> >> rule default.rgw.buckets.data {
> >> id 2
> >> type erasure
> >> step set_chooseleaf_tries 5
> >> step set_choose_tries 100
> >> step take default class hdd
> >> step choose indep 0 type osd
> >> step emit
> >> }
> >> rule ceph-block {
> >> id 3
> >> type erasure
> >> step set_chooseleaf_tries 5
> >> step set_choose_tries 100
> >> step take default class hdd
> >> step choose indep 0 type osd
> >> step emit
> >> }
> >> rule replicated-hdd {
> >> id 4
> >> type replicated
> >> step take default class hdd
> >> step choose firstn 0 type osd
> >> step emit
> >> }
> >>
> >> # end crush map
> >>
> >>
> >> Of these, the last - id 4 - is one that I added while trying to figure
> >> this out.  What this tells me is that the 'take' step in rule id 0
> should
> >> probably change to 'step take default class hdd'.
> >>
> >> I also notice that each of my host stanzas (buckets) has what looks like
> >> two roots.  For example
> >>
> >> host ceph00 {
> >> id -3 # do not change unnecessarily
> >> id -4 class hdd # do not change unnecessarily
> >> # weight 90.51434
> >> alg straw2
> >> hash 0 # rjenkins1
> >> item osd.0 weight 11.35069
> >> item osd.1 weight 11.35069
> >> item osd.2 weight 11.35069
> >> item osd.3 weight 11.35069
> >> item osd.4 weight 11.27789
> >> item osd.5 weight 11.27789
> >> item osd.6 weight 11.27789
> >> item osd.7 weight 11.27789
> >> }
> >>
> >>
> >> I assume I may need to clean this up somehow, or perhaps this is the
> real
> >> problem.
> >>
> >> Please advise.
> >>
> >> Thanks.
> >>
> >> -Dave
> >>
> >> --
> >> Dave Hall
> >> Binghamton University
> >> kdhall@xxxxxxxxxxxxxx
> >>
> >> On Thu, Sep 19, 2024 at 3:56 AM Stefan Kooman <stefan@xxxxxx> wrote:
> >>
> >>> On 19-09-2024 05:10, Anthony D'Atri wrote:
> >>> >
> >>> >
> >>> >>
> >>> >> Anthony,
> >>> >>
> >>> >> So it sounds like I need to make a new crush rule for replicated
> pools
> >>> that specifies default-hdd and the device class?  (Or should I go the
> other
> >>> way around?  I think I'd rather change the replicated pools even though
> >>> there's more of them.)
> >>> >
> >>> > I think it would be best to edit the CRUSH rules in-situ so that each
> >>> specifies the device class, that way if you do get different media in
> the
> >>> future, you'll be ready.  Rather than messing around with new rules and
> >>> modifying pools, this is arguably one of the few times when one would
> >>> decompile, edit, recompile, and inject the CRUSH map in toto.
> >>> >
> >>> > I haven't tried this myself, but maybe something like the below, to
> >>> avoid the PITA and potential for error of edting the decompiled text
> file
> >>> by hand.
> >>> >
> >>> >
> >>> > ceph osd getcrushmap -o original.crush
> >>> > crushtool -d original.crush -o original.txt
> >>> > crushtool -i original.crush --reclassify --reclassify-root default
> hdd
> >>> --set-subtree-class default hdd -o adjusted.crush
> >>> > crushtool -d adjusted.crush -o adjusted.txt
> >>> > crushtool -i original.crush --compare adjusted.crush
> >>> > ceph osd setcrushmap -i adjusted.crush
> >>>
> >>> This might be of use as well (if a lot of data would move):
> >>> https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/
> >>>
> >>> Gr. Stefan
> >>>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux