Some questions about crush_choose_indep

"ningt0509@xxxxxxxxx" <ningt0509@xxxxxxxxx> · Wed, 28 Nov 2018 21:02:16 +0800

I configured two environments.

1. First environment:
Four hosts, one EC storage pool, k=4,m=2, Crush rules are as follows:
Crush rule:
rule ec_4_2 {
       id 1
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 400
        step take default
        step choose indep 0 type host
        step chooseleaf indep 2 type osd
        step emit
}

When I shut down one of the hosts and waited for OSD of the corresponding host to be marked out,
PG could not restore the active+clean state

ID  CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       12.00000 root default                           
 -5        3.00000     host host0                         
  0   ssd  1.00000         osd.0    down        0 1.00000 
  1   ssd  1.00000         osd.1    down        0 1.00000 
  2   ssd  1.00000         osd.2    down        0 1.00000 
 -7        3.00000     host host1                         
  3   ssd  1.00000         osd.3      up  1.00000 1.00000 
  4   ssd  1.00000         osd.4      up  1.00000 1.00000 
  5   ssd  1.00000         osd.5      up  1.00000 1.00000 
 -9        3.00000     host host2                         
  6   ssd  1.00000         osd.6      up  1.00000 1.00000 
  7   ssd  1.00000         osd.7      up  1.00000 1.00000 
  8   ssd  1.00000         osd.8      up  1.00000 1.00000 
-11        3.00000     host host3                         
  9   ssd  1.00000         osd.9      up  1.00000 1.00000 
 10   ssd  1.00000         osd.10     up  1.00000 1.00000 
 11   ssd  1.00000         osd.11     up  1.00000 1.00000 
  cluster:
    id:     5e527773-9873-4100-bcce-19a1eaf6e496
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a
    mgr: x(active)
    osd: 12 osds: 9 up, 9 in

  data:
    pools:   1 pools, 32 pgs
    objects: 0 objects, 0 bytes
    usage:   9238 MB used, 82921 MB / 92160 MB avail
    pgs:     26 active+undersized
             6  active+clean

2. Second environment
Eight hosts, EC storage pool, k=4,m=2, Crush rules are as follows
rule ec_4_2 {
       id 1
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 400
        step take default
        step chooseleaf indep 0 type host
        step emit
}
After I shut down one host and waited for OSD on the corresponding host to be marked out, PG could restore active+clean
If I change the Crush rule to something like this:
rule ec_4_2 {
       id 1
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 400
        step take default
        step choose indep 0 type host
        step chooseleaf indep 1 type osd
        step emit
}

PG could not recover active+clean after one of the hosts was down

Analyze the code for the first configuration，After OSD under one of the hosts is marked out, that host will still be elected as crush_choose_indep input,
The second configuration does not，Is there any good way to handle such a scenario？
crush_do_rule()
{
	...
	out_size = ((numrep < (result_max-osize)) ? numrep : (result_max-osize));
	crush_choose_indep(
		map,
		cw,
		map->buckets[bno],
		weight, weight_max,
		x, out_size, numrep,
		curstep->arg2,
		o+osize, j,
		choose_tries,
		choose_leaf_tries ? choose_leaf_tries : 1,
		recurse_to_leaf,
		c+osize,
		0,
		choose_args);
		osize += out_size;
...
}

--------------
ningt0509@xxxxxxxxx