I could be wrong however as far as I can see you have 9 chunks which requires 9 failure domains. Your failure domain is set to datacenter which you only have 3 of. So that won't work. You need to set your failure domain to host and then create a crush rule to choose a DC and choose 3 hosts within each DC Something like this should work: step choose indep 3 type datacenter step chooseleaf indep 3 type host On Fri, 12 Jan 2024 at 20:58, Torkil Svensgaard <torkil@xxxxxxxx> wrote: > We are looking to create a 3 datacenter 4+5 erasure coded pool but can't > quite get it to work. Ceph version 17.2.7. These are the hosts (there > will eventually be 6 hdd hosts in each datacenter): > > -33 886.00842 datacenter 714 > -7 209.93135 host ceph-hdd1 > > -69 69.86389 host ceph-flash1 > -6 188.09579 host ceph-hdd2 > > -3 233.57649 host ceph-hdd3 > > -12 184.54091 host ceph-hdd4 > -34 824.47168 datacenter DCN > -73 69.86389 host ceph-flash2 > -2 201.78067 host ceph-hdd5 > > -81 288.26501 host ceph-hdd6 > > -31 264.56207 host ceph-hdd7 > > -36 1284.48621 datacenter TBA > -77 69.86389 host ceph-flash3 > -21 190.83224 host ceph-hdd8 > > -29 199.08838 host ceph-hdd9 > > -11 193.85382 host ceph-hdd10 > > -9 237.28154 host ceph-hdd11 > > -26 187.19536 host ceph-hdd12 > > -4 206.37102 host ceph-hdd13 > > We did this: > > ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd > plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default > crush-failure-domain=datacenter crush-device-class=hdd > > ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd > ceph osd pool set cephfs.hdd.data allow_ec_overwrites true > ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn > > Didn't quite work: > > " > [WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg > incomplete > pg 33.0 is creating+incomplete, acting > [104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool > cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for > 'incomplete') > " > > I then manually changed the crush rule from this: > > " > rule cephfs.hdd.data { > id 7 > type erasure > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take default class hdd > step chooseleaf indep 0 type datacenter > step emit > } > " > > To this: > > " > rule cephfs.hdd.data { > id 7 > type erasure > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take default class hdd > step choose indep 0 type datacenter > step chooseleaf indep 3 type host > step emit > } > " > > Based on some testing and dialogue I had with Red Hat support last year > when we were on RHCS, and it seemed to work. Then: > > ceph fs add_data_pool cephfs cephfs.hdd.data > ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data > > I started copying data to the subvolume and increased pg_num a couple of > times: > > ceph osd pool set cephfs.hdd.data pg_num 256 > ceph osd pool set cephfs.hdd.data pg_num 2048 > > But at some point it failed to activate new PGs eventually leading to this: > > " > [WARN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs > mds.cephfs.ceph-flash1.agdajf(mds.0): 64 slow metadata IOs are > blocked > 30 secs, oldest blocked for 25455 secs > [WARN] MDS_TRIM: 1 MDSs behind on trimming > mds.cephfs.ceph-flash1.agdajf(mds.0): Behind on trimming > (997/128) max_segments: 128, num_segments: 997 > [WARN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive > pg 33.6f6 is stuck inactive for 8h, current state > activating+remapped, last acting [50,79,116,299,98,219,164,124,421] > pg 33.6fa is stuck inactive for 11h, current state > activating+undersized+degraded+remapped, last acting > [17,408,NONE,196,223,290,73,39,11] > pg 33.705 is stuck inactive for 11h, current state > activating+undersized+degraded+remapped, last acting > [33,273,71,NONE,411,96,28,7,161] > pg 33.721 is stuck inactive for 7h, current state > activating+remapped, last acting [283,150,209,423,103,325,118,142,87] > pg 33.726 is stuck inactive for 11h, current state > activating+undersized+degraded+remapped, last acting > [234,NONE,416,121,54,141,277,265,19] > [WARN] PG_DEGRADED: Degraded data redundancy: 1818/1282640036 objects > degraded (0.000%), 3 pgs degraded, 3 pgs undersized > pg 33.6fa is stuck undersized for 7h, current state > activating+undersized+degraded+remapped, last acting > [17,408,NONE,196,223,290,73,39,11] > pg 33.705 is stuck undersized for 7h, current state > activating+undersized+degraded+remapped, last acting > [33,273,71,NONE,411,96,28,7,161] > pg 33.726 is stuck undersized for 7h, current state > activating+undersized+degraded+remapped, last acting > [234,NONE,416,121,54,141,277,265,19] > [WARN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups > Pool cephfs.hdd.data has 1024 placement groups, should have 2048 > [WARN] SLOW_OPS: 3925 slow ops, oldest one blocked for 26012 sec, > daemons [osd.17,osd.234,osd.283,osd.33,osd.50] have slow ops. > " > > We had thought it would only affect that particular data pool but > eventually everything ground to a halt, also RBD on unrelated replicated > pools. Looking at it now I guess that was because the 5 OSDs were > blocked for everything and not just the PGs for that data pool? > > We tried restarting the 5 blocked OSDs to no avail and eventually > resorted to deleting the cephfs.hdd.data data pool to restore service. > > Any suggestions as to what we did wrong? Something to do with min_size? > The crush rule? > > Thanks. > > Mvh. > > Torkil > -- > Torkil Svensgaard > Systems Administrator > Danish Research Centre for Magnetic Resonance DRCMR, Section 714 > Copenhagen University Hospital Amager and Hvidovre > Kettegaard Allé 30, 2650 Hvidovre, Denmark > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx