Crush algorithm problem

"ningt0509@xxxxxxxxx" <ningt0509@xxxxxxxxx> · Sat, 24 Nov 2018 15:04:37 +0800

There are four hosts in the environment, the storage pool use EC 4+2, and the Crush rule is configured to select two osds from each host. When I shut down one host, all osds are marked as out state, but PG cannot restore active+clean. Why PG cannot map OSD on another host, Is there a problem with this situation?

ID  CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       30.00000 root default                           
 -5        7.00000     host host0                         
  0   ssd  1.00000         osd.0    down        0 1.00000 
  1   ssd  1.00000         osd.1    down        0 1.00000 
  2   ssd  1.00000         osd.2    down        0 1.00000 
  3   ssd  1.00000         osd.3    down        0 1.00000 
  4   ssd  1.00000         osd.4    down        0 1.00000 
  5   ssd  1.00000         osd.5    down        0 1.00000 
  6   ssd  1.00000         osd.6    down        0 1.00000 
 -7        7.00000     host host1                         
  7   ssd  1.00000         osd.7      up  1.00000 1.00000 
  8   ssd  1.00000         osd.8      up  1.00000 1.00000 
  9   ssd  1.00000         osd.9      up  1.00000 1.00000 
 10   ssd  1.00000         osd.10     up  1.00000 1.00000 
 11   ssd  1.00000         osd.11     up  1.00000 1.00000 
 12   ssd  1.00000         osd.12     up  1.00000 1.00000 
 13   ssd  1.00000         osd.13     up  1.00000 1.00000 
 -9        8.00000     host host2                         
 14   ssd  1.00000         osd.14     up  1.00000 1.00000 
 15   ssd  1.00000         osd.15     up  1.00000 1.00000 
 16   ssd  1.00000         osd.16     up  1.00000 1.00000 
 17   ssd  1.00000         osd.17     up  1.00000 1.00000 
 18   ssd  1.00000         osd.18     up  1.00000 1.00000 
 19   ssd  1.00000         osd.19     up  1.00000 1.00000 
 20   ssd  1.00000         osd.20     up  1.00000 1.00000 
 21   ssd  1.00000         osd.21     up  1.00000 1.00000 
-11        8.00000     host host3                         
 29        1.00000         osd.29     up  1.00000 1.00000 
 22   ssd  1.00000         osd.22     up  1.00000 1.00000 
 23   ssd  1.00000         osd.23     up  1.00000 1.00000 
 24   ssd  1.00000         osd.24     up  1.00000 1.00000 
 25   ssd  1.00000         osd.25     up  1.00000 1.00000 
 26   ssd  1.00000         osd.26     up  1.00000 1.00000 
 27   ssd  1.00000         osd.27     up  1.00000 1.00000 
 28   ssd  1.00000         osd.28     up  1.00000 1.00000 

  cluster:
    id:     d24174ae-a1bf-43f9-a8f3-a10246988ab7
    health: HEALTH_WARN
            Reduced data availability: 413 pgs inactive
            Degraded data redundancy: 414 pgs undersized

  services:
    mon: 1 daemons, quorum a
    mgr: x(active)
    osd: 30 osds: 23 up, 23 in; 3 remapped pgs

  data:
    pools:   1 pools, 512 pgs
    objects: 0 objects, 0 bytes
    usage:   24026 MB used, 206 GB / 230 GB avail
    pgs:     80.664% pgs not active
             413 undersized+peered
             96  active+clean
             2   active+clean+remapped
             1   active+undersized+remapped

The Ceph environment configuration is as follows:

Crush rule:
rule ec_4_2 {
       id 1
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 400
        step take default
        step choose indep 0 type host
        step chooseleaf indep 2 type osd
        step emit
}

Pool:
pool 1 'ec_4_2' erasure size 6 min_size 5 origin_min_size 0 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 94 flags hashpspool stripe_width 16384

--------------
ningt0509@xxxxxxxxx