Re: PGs activating+remapped, PG overdose protection?

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 1 Aug 2018 20:03:19 +0200

You should probably have used 2048 following the usual target of 100 PGs per OSD.
Just increase the mon_max_pg_per_osd option, ~200 is still okay-ish and your cluster will grow out of it :)

Paul

2018-08-01 19:55 GMT+02:00 Alexandros Afentoulis <alexaf+ceph@xxxxxxxxxxxx>:
Hello people :)

we are facing a situation quite similar to the one described here:

http://tracker.ceph.com/issues/23117

Namely:

we have a Luminous cluster consisting of 16 hosts, where each host holds

12 OSDs on spinning disks and 4 OSDs on SSDs. Let's forget the SSDs for

now since they're not used atm.

We have a Erasure Coding pool (k=6, m=3) with 4096 PGs, residing on the

spinning disks, with failure domain the host.

After getting a host (and their OSDs) out for maintenance, we're trying

to put the OSDs back in. While cluster starts recovering we observe

> Reduced data availability: 170 pgs inactive

and

> 170  activating+remapped

This eventually leads to slow/stucked requests and we have to get the

OSDs out again.

While searching around we came across the already mentioned issue on

tracker [1] and we're wondering "PG overdose protection" [2] is what

we're really facing now.

Our cluster features:

"mon_max_pg_per_osd": "200",

"osd_max_pg_per_osd_hard_ratio": "2.000000",

What is more, we observed that the PGs distribution among the OSDs is

not uniform, eg:

> ID  CLASS WEIGHT    REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS TYPE NAME         

>  -1       711.29004        -   666T   165T   500T     0    0   - root default      

> -17        44.68457        - 45757G 11266G 34491G 24.62 0.99   -     host rd3-1427 

>   9   hdd   3.66309  1.00000  3751G   976G  2774G 26.03 1.05 212         osd.9     

>  30   hdd   3.66309  1.00000  3751G   961G  2789G 25.64 1.03 209         osd.30    

>  46   hdd   3.66309  1.00000  3751G   902G  2848G 24.07 0.97 196         osd.46    

>  61   hdd   3.66309  1.00000  3751G   877G  2873G 23.40 0.94 190         osd.61    

>  76   hdd   3.66309  1.00000  3751G   984G  2766G 26.24 1.05 214         osd.76    

>  92   hdd   3.66309  1.00000  3751G   894G  2856G 23.84 0.96 194         osd.92    

> 107   hdd   3.66309  1.00000  3751G   881G  2869G 23.50 0.94 191         osd.107   

> 123   hdd   3.66309  1.00000  3751G   973G  2777G 25.97 1.04 212         osd.123   

> 138   hdd   3.66309  1.00000  3751G   975G  2775G 26.01 1.05 212         osd.138   

> 156   hdd   3.66309  1.00000  3751G   813G  2937G 21.69 0.87 176         osd.156   

> 172   hdd   3.66309  1.00000  3751G  1016G  2734G 27.09 1.09 221         osd.172   

> 188   hdd   3.66309  1.00000  3751G   998G  2752G 26.62 1.07 217         osd.188 

Could these OSDs, holding more than 200 PGs, contribute to the problem?

Is there any way to confirm that we're hitting the "PG overdose

protection"? If that's true how can restore our cluster back to normal.

Apart from getting these OSDs back to work, we're concerned about the

overall choices regarding the number of PGs (4096) for that (6,3) EC pool.

Any help appreciated,

Alex

[1] http://tracker.ceph.com/issues/23117

[2] https://ceph.com/community/new-luminous-pg-overdose-protection/

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com