PGs activating+remapped, PG overdose protection?

Alexandros Afentoulis <alexaf+ceph@xxxxxxxxxxxx> · Wed, 1 Aug 2018 20:55:32 +0300

Hello people :)

we are facing a situation quite similar to the one described here:
http://tracker.ceph.com/issues/23117

Namely:

we have a Luminous cluster consisting of 16 hosts, where each host holds
12 OSDs on spinning disks and 4 OSDs on SSDs. Let's forget the SSDs for
now since they're not used atm.

We have a Erasure Coding pool (k=6, m=3) with 4096 PGs, residing on the
spinning disks, with failure domain the host.

After getting a host (and their OSDs) out for maintenance, we're trying
to put the OSDs back in. While cluster starts recovering we observe

> Reduced data availability: 170 pgs inactive

and

> 170  activating+remapped

This eventually leads to slow/stucked requests and we have to get the
OSDs out again.

While searching around we came across the already mentioned issue on
tracker [1] and we're wondering "PG overdose protection" [2] is what
we're really facing now.

Our cluster features:

"mon_max_pg_per_osd": "200",
"osd_max_pg_per_osd_hard_ratio": "2.000000",

What is more, we observed that the PGs distribution among the OSDs is
not uniform, eg:

> ID  CLASS WEIGHT    REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS TYPE NAME         
>  -1       711.29004        -   666T   165T   500T     0    0   - root default      
> -17        44.68457        - 45757G 11266G 34491G 24.62 0.99   -     host rd3-1427 
>   9   hdd   3.66309  1.00000  3751G   976G  2774G 26.03 1.05 212         osd.9     
>  30   hdd   3.66309  1.00000  3751G   961G  2789G 25.64 1.03 209         osd.30    
>  46   hdd   3.66309  1.00000  3751G   902G  2848G 24.07 0.97 196         osd.46    
>  61   hdd   3.66309  1.00000  3751G   877G  2873G 23.40 0.94 190         osd.61    
>  76   hdd   3.66309  1.00000  3751G   984G  2766G 26.24 1.05 214         osd.76    
>  92   hdd   3.66309  1.00000  3751G   894G  2856G 23.84 0.96 194         osd.92    
> 107   hdd   3.66309  1.00000  3751G   881G  2869G 23.50 0.94 191         osd.107   
> 123   hdd   3.66309  1.00000  3751G   973G  2777G 25.97 1.04 212         osd.123   
> 138   hdd   3.66309  1.00000  3751G   975G  2775G 26.01 1.05 212         osd.138   
> 156   hdd   3.66309  1.00000  3751G   813G  2937G 21.69 0.87 176         osd.156   
> 172   hdd   3.66309  1.00000  3751G  1016G  2734G 27.09 1.09 221         osd.172   
> 188   hdd   3.66309  1.00000  3751G   998G  2752G 26.62 1.07 217         osd.188 

Could these OSDs, holding more than 200 PGs, contribute to the problem?

Is there any way to confirm that we're hitting the "PG overdose
protection"? If that's true how can restore our cluster back to normal.

Apart from getting these OSDs back to work, we're concerned about the
overall choices regarding the number of PGs (4096) for that (6,3) EC pool.

Any help appreciated,
Alex

[1] http://tracker.ceph.com/issues/23117
[2] https://ceph.com/community/new-luminous-pg-overdose-protection/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com