Hi Fulvio, I suggest removing only the upmaps which are clearly incorrect, and then see if the upmap balancer re-creates them. Perhaps they were created when they were not incorrect, when you had a different crush rule? Or perhaps you're running an old version of ceph which had buggy balancer implementation? Cheers, Dan On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote: > > Hallo Dan, Nathan, thanks for your replies and apologies for my silence. > > Sorry I had made a typo... the rule is really 6+4. And to reply to > Nathan's message, the rule was built like this in anticipation of > getting additional servers, at which point in time I will relax the "2 > chunks per OSD" part. > > [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get > default.rgw.buckets.data erasure_code_profile > erasure_code_profile: ec_6and4_big > [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get > ec_6and4_big > crush-device-class=big > crush-failure-domain=osd > crush-root=default > jerasure-per-chunk-alignment=false > k=6 > m=4 > plugin=jerasure > technique=reed_sol_van > w=8 > > Indeed, Dan: > > [cephmgr@cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453 > pg_upmap_items 116.453 [76,49,129,108] > > Don't think I ever set such an upmap myself. Do you think it would be > good to try and remove all upmaps, let the upmap balancer do its magic, > and check again? > > Thanks! > > Fulvio > > > On 20/05/2021 18:59, Dan van der Ster wrote: > > Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you > > choose 6 type host and then chooseleaf 2 type osd? > > > > .. Dan > > > > > > On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx > > <mailto:fulvio.galeazzi@xxxxxxx>> wrote: > > > > Hallo Dan, Bryan, > > I have a rule similar to yours, for an 8+4 pool, with only > > difference that I replaced the second "choose" with "chooseleaf", which > > I understand should make no difference: > > > > rule default.rgw.buckets.data { > > id 6 > > type erasure > > min_size 3 > > max_size 10 > > step set_chooseleaf_tries 5 > > step set_choose_tries 100 > > step take default class big > > step choose indep 5 type host > > step chooseleaf indep 2 type osd > > step emit > > } > > > > I am on Nautilus 14.2.16 and while performing a maintenance the > > other > > day, I noticed 2 PGs were incomplete and caused troubles to some users. > > I then verified that (thanks Bryan for the command): > > > > [cephmgr@cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map > > 116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r '.host' > > ; done | sort | uniq -c | sort -n -k1 > > 2 r2srv07.ct1.box.garr > > 2 r2srv10.ct1.box.garr > > 2 r3srv07.ct1.box.garr > > 4 r1srv02.ct1.box.garr > > > > You see that 4 PGs were put on r1srv02. > > May be this happened due to some temporary unavailability of the > > host at > > some point? As all my servers are now up and running, is there a way to > > force the placement rule to rerun? > > > > Thanks! > > > > Fulvio > > > > > > Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto: > > > Hi Bryan, > > > > > > I had to do something similar, and never found a rule to place > > "up to" > > > 2 chunks per host, so I stayed with the placement of *exactly* 2 > > > chunks per host. > > > > > > But I did this slightly differently to what you wrote earlier: my > > rule > > > chooses exactly 4 hosts, then chooses exactly 2 osds on each: > > > > > > type erasure > > > min_size 3 > > > max_size 10 > > > step set_chooseleaf_tries 5 > > > step set_choose_tries 100 > > > step take default class hdd > > > step choose indep 4 type host > > > step choose indep 2 type osd > > > step emit > > > > > > If you really need the "up to 2" approach then maybe you can split > > > each host into two "host" crush buckets, with half the OSDs in each. > > > Then a normal host-wise rule should work. > > > > > > Cheers, Dan > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx