Re: CRUSH rule for EC 6+2 on 6-node cluster

Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> · Fri, 4 Jun 2021 10:50:11 +0200

Hallo Dan,
    I am using Nautilus with a slightly outdated version 14.2.16, and I 
don't remember me playing with upmaps in the past.
Following your suggestion, I removed a bunch of upmaps (the "longer" 
lines) and after a while I verified that all PGs are properly mapped.

  Thanks!

			Fulvio

Il 5/27/2021 5:33 PM, Dan van der Ster ha scritto:
Hi Fulvio,

I suggest removing only the upmaps which are clearly incorrect, and
then see if the upmap balancer re-creates them.
Perhaps they were created when they were not incorrect, when you had a
different crush rule?
Or perhaps you're running an old version of ceph which had buggy
balancer implementation?

Cheers, Dan

On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:

Hallo Dan, Nathan, thanks for your replies and apologies for my silence.

    Sorry I had made a typo... the rule is really 6+4. And to reply to
Nathan's message, the rule was built like this in anticipation of
getting additional servers, at which point in time I will relax the "2
chunks per OSD" part.

[cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get
default.rgw.buckets.data erasure_code_profile
erasure_code_profile: ec_6and4_big
[cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get
ec_6and4_big
crush-device-class=big
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=4
plugin=jerasure
technique=reed_sol_van
w=8

Indeed, Dan:

[cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453
pg_upmap_items 116.453 [76,49,129,108]

Don't think I ever set such an upmap myself. Do you think it would be
good to try and remove all upmaps, let the upmap balancer do its magic,
and check again?

    Thanks!

                         Fulvio

On 20/05/2021 18:59, Dan van der Ster wrote:
Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you
choose 6 type host and then chooseleaf 2 type osd?

.. Dan

On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx
<mailto:fulvio.galeazzi@xxxxxxx>> wrote:

     Hallo Dan, Bryan,
           I have a rule similar to yours, for an 8+4 pool, with only
     difference that I replaced the second "choose" with "chooseleaf", which
     I understand should make no difference:

     rule default.rgw.buckets.data {
               id 6
               type erasure
               min_size 3
               max_size 10
               step set_chooseleaf_tries 5
               step set_choose_tries 100
               step take default class big
               step choose indep 5 type host
               step chooseleaf indep 2 type osd
               step emit
     }

         I am on Nautilus 14.2.16 and while performing a maintenance the
     other
     day, I noticed 2 PGs were incomplete and caused troubles to some users.
     I then verified that (thanks Bryan for the command):

     [cephmgr@cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map
     116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r '.host'
     ; done | sort | uniq -c | sort -n -k1
             2 r2srv07.ct1.box.garr
             2 r2srv10.ct1.box.garr
             2 r3srv07.ct1.box.garr
             4 r1srv02.ct1.box.garr

         You see that 4 PGs were put on r1srv02.
     May be this happened due to some temporary unavailability of the
     host at
     some point? As all my servers are now up and running, is there a way to
     force the placement rule to rerun?

         Thanks!

                              Fulvio

     Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto:
      > Hi Bryan,
      >
      > I had to do something similar, and never found a rule to place
     "up to"
      > 2 chunks per host, so I stayed with the placement of *exactly* 2
      > chunks per host.
      >
      > But I did this slightly differently to what you wrote earlier: my
     rule
      > chooses exactly 4 hosts, then chooses exactly 2 osds on each:
      >
      >          type erasure
      >          min_size 3
      >          max_size 10
      >          step set_chooseleaf_tries 5
      >          step set_choose_tries 100
      >          step take default class hdd
      >          step choose indep 4 type host
      >          step choose indep 2 type osd
      >          step emit
      >
      > If you really need the "up to 2" approach then maybe you can split
      > each host into two "host" crush buckets, with half the OSDs in each.
      > Then a normal host-wise rule should work.
      >
      > Cheers, Dan
      >

--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250

Attachment:
smime.p7s

Description: Firma crittografica S/MIME
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx