Hello Torkil, We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the below rule: rule ec54 { id 3 type erasure min_size 3 max_size 9 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 0 type datacenter step chooseleaf indep 3 type host step emit } Works fine. The only difference I see with your EC rule is the fact that we set min_size and max_size but I doubt this has anything to do with your situation. Since the cluster still complains about "Pool cephfs.hdd.data has 1024 placement groups, should have 2048", did you run "ceph osd pool set cephfs.hdd.data pgp_num 2048" right after running "ceph osd pool set cephfs.hdd.data pg_num 2048"? [1] Might be that the pool still has 1024 PGs. Regards, Frédéric. [1] https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups -----Message original----- De: Torkil <torkil@xxxxxxxx> à: ceph-users <ceph-users@xxxxxxx> Cc: Ruben <rkv@xxxxxxxx> Envoyé: vendredi 12 janvier 2024 09:00 CET Sujet : 3 DC with 4+5 EC not quite working We are looking to create a 3 datacenter 4+5 erasure coded pool but can't quite get it to work. Ceph version 17.2.7. These are the hosts (there will eventually be 6 hdd hosts in each datacenter): -33 886.00842 datacenter 714 -7 209.93135 host ceph-hdd1 -69 69.86389 host ceph-flash1 -6 188.09579 host ceph-hdd2 -3 233.57649 host ceph-hdd3 -12 184.54091 host ceph-hdd4 -34 824.47168 datacenter DCN -73 69.86389 host ceph-flash2 -2 201.78067 host ceph-hdd5 -81 288.26501 host ceph-hdd6 -31 264.56207 host ceph-hdd7 -36 1284.48621 datacenter TBA -77 69.86389 host ceph-flash3 -21 190.83224 host ceph-hdd8 -29 199.08838 host ceph-hdd9 -11 193.85382 host ceph-hdd10 -9 237.28154 host ceph-hdd11 -26 187.19536 host ceph-hdd12 -4 206.37102 host ceph-hdd13 We did this: ceph osd erasure-code-profile set DRCMR_k4m5_datacenter_hdd plugin=jerasure k=4 m=5 technique=reed_sol_van crush-root=default crush-failure-domain=datacenter crush-device-class=hdd ceph osd pool create cephfs.hdd.data erasure DRCMR_k4m5_datacenter_hdd ceph osd pool set cephfs.hdd.data allow_ec_overwrites true ceph osd pool set cephfs.hdd.data pg_autoscale_mode warn Didn't quite work: " [WARN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg incomplete pg 33.0 is creating+incomplete, acting [104,219,NONE,NONE,NONE,41,NONE,NONE,NONE] (reducing pool cephfs.hdd.data min_size from 5 may help; search ceph.com/docs for 'incomplete') " I then manually changed the crush rule from this: " rule cephfs.hdd.data { id 7 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type datacenter step emit } " To this: " rule cephfs.hdd.data { id 7 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 0 type datacenter step chooseleaf indep 3 type host step emit } " Based on some testing and dialogue I had with Red Hat support last year when we were on RHCS, and it seemed to work. Then: ceph fs add_data_pool cephfs cephfs.hdd.data ceph fs subvolumegroup create hdd --pool_layout cephfs.hdd.data I started copying data to the subvolume and increased pg_num a couple of times: ceph osd pool set cephfs.hdd.data pg_num 256 ceph osd pool set cephfs.hdd.data pg_num 2048 But at some point it failed to activate new PGs eventually leading to this: " [WARN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs mds.cephfs.ceph-flash1.agdajf(mds.0): 64 slow metadata IOs are blocked > 30 secs, oldest blocked for 25455 secs [WARN] MDS_TRIM: 1 MDSs behind on trimming mds.cephfs.ceph-flash1.agdajf(mds.0): Behind on trimming (997/128) max_segments: 128, num_segments: 997 [WARN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive pg 33.6f6 is stuck inactive for 8h, current state activating+remapped, last acting [50,79,116,299,98,219,164,124,421] pg 33.6fa is stuck inactive for 11h, current state activating+undersized+degraded+remapped, last acting [17,408,NONE,196,223,290,73,39,11] pg 33.705 is stuck inactive for 11h, current state activating+undersized+degraded+remapped, last acting [33,273,71,NONE,411,96,28,7,161] pg 33.721 is stuck inactive for 7h, current state activating+remapped, last acting [283,150,209,423,103,325,118,142,87] pg 33.726 is stuck inactive for 11h, current state activating+undersized+degraded+remapped, last acting [234,NONE,416,121,54,141,277,265,19] [WARN] PG_DEGRADED: Degraded data redundancy: 1818/1282640036 objects degraded (0.000%), 3 pgs degraded, 3 pgs undersized pg 33.6fa is stuck undersized for 7h, current state activating+undersized+degraded+remapped, last acting [17,408,NONE,196,223,290,73,39,11] pg 33.705 is stuck undersized for 7h, current state activating+undersized+degraded+remapped, last acting [33,273,71,NONE,411,96,28,7,161] pg 33.726 is stuck undersized for 7h, current state activating+undersized+degraded+remapped, last acting [234,NONE,416,121,54,141,277,265,19] [WARN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups Pool cephfs.hdd.data has 1024 placement groups, should have 2048 [WARN] SLOW_OPS: 3925 slow ops, oldest one blocked for 26012 sec, daemons [osd.17,osd.234,osd.283,osd.33,osd.50] have slow ops. " We had thought it would only affect that particular data pool but eventually everything ground to a halt, also RBD on unrelated replicated pools. Looking at it now I guess that was because the 5 OSDs were blocked for everything and not just the PGs for that data pool? We tried restarting the 5 blocked OSDs to no avail and eventually resorted to deleting the cephfs.hdd.data data pool to restore service. Any suggestions as to what we did wrong? Something to do with min_size? The crush rule? Thanks. Mvh. Torkil -- Torkil Svensgaard Systems Administrator Danish Research Centre for Magnetic Resonance DRCMR, Section 714 Copenhagen University Hospital Amager and Hvidovre Kettegaard Allé 30, 2650 Hvidovre, Denmark _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx