Hi Mark, K=2 + M=2 EC profile with set to host failure domain will require at least 4 node. “The simplest erasure coded pool is equivalent to RAID5<https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5> and requires at least three hosts:”. This is assume your EC Profile is K=2+M=1 which is technically minimum configuration, but generally NOT recommended for data durability. Regards, Levin From: Mark S. Holliman <msh@xxxxxxxxx> Date: Monday, 25 July 2022 at 21:15 To: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Default erasure code profile not working for 3 node cluster? Dear All, I've recently setup a 3 node Ceph Quincy (17.2) cluster to serve a pair of CephFS mounts for a Slurm cluster. Each ceph node has 6 x SSD and 6 x HDD, and I've setup the pools and crush rules to create separate CephFS filesystems using the different disk classes. I used the default erasure-code-profile to create the pools (see details below), as the documentation states that it works on a 3 node cluster. The system looked healthy after the initial setup, but now a few weeks in I'm seeing signs of problems: a growing count of pgs not deep-scrubbed in time, significant numbers of pgs in "active+undersized"/"active+undersized+degraded", most pgs in a "active+clean+remapped" state, and no recovery activity. I looked at some of the pgs in the stuck states, and noticed that they all list a "NONE" OSD in their 'last acting' list, which points to this issue: https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean . It's likely that is what is causing the pgs to get stuck in a degraded state and the ever growing list of late deep scrubs. But I'm confused why the documentation states that the default erasure code should work on a 3 node cluster - https://docs.ceph.com/en/latest/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool Is this documentation in error? Or is there something else going on with my setup? What is an ideal erasure code profile for a 3 node system? Cheers, Mark ### Commands used to create the CephFS filesystem ### ceph osd pool create cephfsHDD_data 1024 1024 erasure ceph osd pool create cephfsHDD_metadata 64 64 ceph osd erasure-code-profile set dataHDD crush-device-class=hdd ceph osd crush rule create-erasure dataHDD dataHDD ceph osd pool set cephfsHDD_data crush_rule dataHDD ceph osd pool set cephfsHDD_data allow_ec_overwrites true ceph fs new cephfsHDD cephfsHDD_metadata cephfsHDD_data -force ### Example Status health: HEALTH_WARN Degraded data redundancy: 750/10450 objects degraded (7.177%), 313 pgs degraded, 775 pgs undersized 887 pgs not deep-scrubbed in time 887 pgs not scrubbed in time services: mon: 3 daemons... mgr: ... mds: 2/2 daemons up, 2 standby osd: 36 osds: 36 up (since 27h), 36 in (since 5w); 1272 remapped pgs data: volumes: 2/2 healthy pools: 5 pools, 2176 pgs objects: 2.82k objects, 361 MiB usage: 40 GiB used, 262 TiB / 262 TiB avail pgs: 750/10450 objects degraded (7.177%) 1240/10450 objects misplaced (11.866%) 1272 active+clean+remapped 462 active+undersized 313 active+undersized+degraded 129 active+clean ### Erasure Code Profile k=2 m=2 plugin=jerasure technique=reed_sol_van ### Pool details root@dokkalfar01:~# ceph osd pool get cephfsHDD_data all size: 4 min_size: 3 pg_num: 1023 pgp_num: 972 crush_rule: dataHDD hashpspool: true allow_ec_overwrites: true nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false use_gmt_hitset: 1 erasure_code_profile: default fast_read: 0 pg_autoscale_mode: on eio: false bulk: false ### Example health details of unhappy pgs pg 3.282 is stuck undersized for 27h, current state active+undersized+degraded, last acting [29,15,5,NONE] pg 3.285 is stuck undersized for 27h, current state active+undersized+degraded, last acting [0,17,28,NONE] pg 3.286 is stuck undersized for 27h, current state active+undersized+degraded, last acting [3,17,26,NONE] pg 3.288 is stuck undersized for 27h, current state active+undersized+degraded, last acting [13,NONE,0,24] pg 3.28e is stuck undersized for 27h, current state active+undersized+degraded, last acting [28,NONE,5,14] pg 3.297 is stuck undersized for 27h, current state active+undersized+degraded, last acting [25,5,13,NONE] ------------------------------- Mark Holliman Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh -------------------------------- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx