Re: CRUSH rule for EC 6+2 on 6-node cluster

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 27 May 2021 17:33:31 +0200

Hi Fulvio,

I suggest removing only the upmaps which are clearly incorrect, and
then see if the upmap balancer re-creates them.
Perhaps they were created when they were not incorrect, when you had a
different crush rule?
Or perhaps you're running an old version of ceph which had buggy
balancer implementation?

Cheers, Dan

On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:
>
> Hallo Dan, Nathan, thanks for your replies and apologies for my silence.
>
>    Sorry I had made a typo... the rule is really 6+4. And to reply to
> Nathan's message, the rule was built like this in anticipation of
> getting additional servers, at which point in time I will relax the "2
> chunks per OSD" part.
>
> [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get
> default.rgw.buckets.data erasure_code_profile
> erasure_code_profile: ec_6and4_big
> [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get
> ec_6and4_big
> crush-device-class=big
> crush-failure-domain=osd
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=4
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Indeed, Dan:
>
> [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453
> pg_upmap_items 116.453 [76,49,129,108]
>
> Don't think I ever set such an upmap myself. Do you think it would be
> good to try and remove all upmaps, let the upmap balancer do its magic,
> and check again?
>
>    Thanks!
>
>                         Fulvio
>
>
> On 20/05/2021 18:59, Dan van der Ster wrote:
> > Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you
> > choose 6 type host and then chooseleaf 2 type osd?
> >
> > .. Dan
> >
> >
> > On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx
> > <mailto:fulvio.galeazzi@xxxxxxx>> wrote:
> >
> >     Hallo Dan, Bryan,
> >           I have a rule similar to yours, for an 8+4 pool, with only
> >     difference that I replaced the second "choose" with "chooseleaf", which
> >     I understand should make no difference:
> >
> >     rule default.rgw.buckets.data {
> >               id 6
> >               type erasure
> >               min_size 3
> >               max_size 10
> >               step set_chooseleaf_tries 5
> >               step set_choose_tries 100
> >               step take default class big
> >               step choose indep 5 type host
> >               step chooseleaf indep 2 type osd
> >               step emit
> >     }
> >
> >         I am on Nautilus 14.2.16 and while performing a maintenance the
> >     other
> >     day, I noticed 2 PGs were incomplete and caused troubles to some users.
> >     I then verified that (thanks Bryan for the command):
> >
> >     [cephmgr@cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map
> >     116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r '.host'
> >     ; done | sort | uniq -c | sort -n -k1
> >             2 r2srv07.ct1.box.garr
> >             2 r2srv10.ct1.box.garr
> >             2 r3srv07.ct1.box.garr
> >             4 r1srv02.ct1.box.garr
> >
> >         You see that 4 PGs were put on r1srv02.
> >     May be this happened due to some temporary unavailability of the
> >     host at
> >     some point? As all my servers are now up and running, is there a way to
> >     force the placement rule to rerun?
> >
> >         Thanks!
> >
> >                              Fulvio
> >
> >
> >     Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto:
> >      > Hi Bryan,
> >      >
> >      > I had to do something similar, and never found a rule to place
> >     "up to"
> >      > 2 chunks per host, so I stayed with the placement of *exactly* 2
> >      > chunks per host.
> >      >
> >      > But I did this slightly differently to what you wrote earlier: my
> >     rule
> >      > chooses exactly 4 hosts, then chooses exactly 2 osds on each:
> >      >
> >      >          type erasure
> >      >          min_size 3
> >      >          max_size 10
> >      >          step set_chooseleaf_tries 5
> >      >          step set_choose_tries 100
> >      >          step take default class hdd
> >      >          step choose indep 4 type host
> >      >          step choose indep 2 type osd
> >      >          step emit
> >      >
> >      > If you really need the "up to 2" approach then maybe you can split
> >      > each host into two "host" crush buckets, with half the OSDs in each.
> >      > Then a normal host-wise rule should work.
> >      >
> >      > Cheers, Dan
> >      >
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx