Thank you for your reply, but after I changed the min_size of pool to 4, the pgs was still unable to recover ceph -s cluster: id: 5e527773-9873-4100-bcce-19a1eaf6e496 health: HEALTH_OK services: mon: 1 daemons, quorum a mgr: x(active) osd: 12 osds: 9 up, 9 in data: pools: 1 pools, 32 pgs objects: 0 objects, 0 bytes usage: 9238 MB used, 82921 MB / 92160 MB avail pgs: 26 active+undersized 6 active+clean ceph osd pool ls detail pool 1 'ec' erasure size 6 min_size 4 origin_min_size 0 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 last_change 79 flags hashpspool stripe_width 16384 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 12.00000 root default -5 3.00000 host host0 0 ssd 1.00000 osd.0 down 0 1.00000 1 ssd 1.00000 osd.1 down 0 1.00000 2 ssd 1.00000 osd.2 down 0 1.00000 -7 3.00000 host host1 3 ssd 1.00000 osd.3 up 1.00000 1.00000 4 ssd 1.00000 osd.4 up 1.00000 1.00000 5 ssd 1.00000 osd.5 up 1.00000 1.00000 -9 3.00000 host host2 6 ssd 1.00000 osd.6 up 1.00000 1.00000 7 ssd 1.00000 osd.7 up 1.00000 1.00000 8 ssd 1.00000 osd.8 up 1.00000 1.00000 -11 3.00000 host host3 9 ssd 1.00000 osd.9 up 1.00000 1.00000 10 ssd 1.00000 osd.10 up 1.00000 1.00000 11 ssd 1.00000 osd.11 up 1.00000 1.00000 -------------- ningt0509@xxxxxxxxx > >On 24/11/18 09:04, ningt0509@xxxxxxxxx wrote: >> There are four hosts in the environment, the storage pool use EC 4+2, and the Crush rule is configured to select two osds from each host. When I shut down one host, all osds are marked as out state, but PG cannot restore active+clean. Why PG cannot map OSD on another host, Is there a problem with this situation? >> >> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> -1 30.00000 root default >> -5 7.00000 host host0 >> 0 ssd 1.00000 osd.0 down 0 1.00000 >> 1 ssd 1.00000 osd.1 down 0 1.00000 >> 2 ssd 1.00000 osd.2 down 0 1.00000 >> 3 ssd 1.00000 osd.3 down 0 1.00000 >> 4 ssd 1.00000 osd.4 down 0 1.00000 >> 5 ssd 1.00000 osd.5 down 0 1.00000 >> 6 ssd 1.00000 osd.6 down 0 1.00000 >> -7 7.00000 host host1 >> 7 ssd 1.00000 osd.7 up 1.00000 1.00000 >> 8 ssd 1.00000 osd.8 up 1.00000 1.00000 >> 9 ssd 1.00000 osd.9 up 1.00000 1.00000 >> 10 ssd 1.00000 osd.10 up 1.00000 1.00000 >> 11 ssd 1.00000 osd.11 up 1.00000 1.00000 >> 12 ssd 1.00000 osd.12 up 1.00000 1.00000 >> 13 ssd 1.00000 osd.13 up 1.00000 1.00000 >> -9 8.00000 host host2 >> 14 ssd 1.00000 osd.14 up 1.00000 1.00000 >> 15 ssd 1.00000 osd.15 up 1.00000 1.00000 >> 16 ssd 1.00000 osd.16 up 1.00000 1.00000 >> 17 ssd 1.00000 osd.17 up 1.00000 1.00000 >> 18 ssd 1.00000 osd.18 up 1.00000 1.00000 >> 19 ssd 1.00000 osd.19 up 1.00000 1.00000 >> 20 ssd 1.00000 osd.20 up 1.00000 1.00000 >> 21 ssd 1.00000 osd.21 up 1.00000 1.00000 >> -11 8.00000 host host3 >> 29 1.00000 osd.29 up 1.00000 1.00000 >> 22 ssd 1.00000 osd.22 up 1.00000 1.00000 >> 23 ssd 1.00000 osd.23 up 1.00000 1.00000 >> 24 ssd 1.00000 osd.24 up 1.00000 1.00000 >> 25 ssd 1.00000 osd.25 up 1.00000 1.00000 >> 26 ssd 1.00000 osd.26 up 1.00000 1.00000 >> 27 ssd 1.00000 osd.27 up 1.00000 1.00000 >> 28 ssd 1.00000 osd.28 up 1.00000 1.00000 >> >> cluster: >> id: d24174ae-a1bf-43f9-a8f3-a10246988ab7 >> health: HEALTH_WARN >> Reduced data availability: 413 pgs inactive >> Degraded data redundancy: 414 pgs undersized >> >> services: >> mon: 1 daemons, quorum a >> mgr: x(active) >> osd: 30 osds: 23 up, 23 in; 3 remapped pgs >> >> data: >> pools: 1 pools, 512 pgs >> objects: 0 objects, 0 bytes >> usage: 24026 MB used, 206 GB / 230 GB avail >> pgs: 80.664% pgs not active >> 413 undersized+peered >> 96 active+clean >> 2 active+clean+remapped >> 1 active+undersized+remapped >> >> >> The Ceph environment configuration is as follows: >> >> Crush rule: >> rule ec_4_2 { >> id 1 >> type erasure >> min_size 3 >> max_size 6 >> step set_chooseleaf_tries 5 >> step set_choose_tries 400 >> step take default >> step choose indep 0 type host >> step chooseleaf indep 2 type osd >> step emit >> } >> >> Pool: >> pool 1 'ec_4_2' erasure size 6 min_size 5 origin_min_size 0 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 94 flags hashpspool stripe_width 16384 >> >> >> -------------- >> ningt0509@xxxxxxxxx > >Try setting your pool min_size to temporarily 4 rather than 5 to kick >start the recovery. >I believe this is a feature/bug that EC pools require min_size of pool >chunks to start recovery rather than k chunks. > >Maged