Re: HEALTH_WARN 1 pools have too few placement groups

Ashley Merrick <singapore@xxxxxxxxxxxxxx> · Mon, 16 Mar 2020 16:44:32 +0800

This was a bug in 14.2.7 and calculation for EC pools.

It has been fixed in 14.2.8

---- On Mon, 16 Mar 2020 16:21:41 +0800 Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> wrote ----

Hi, 

I was planing to activate the pg_autoscaler on a EC (6+3) pool which I 
created two years ago. 
Back then I calculated the total # of pgs for this pool with a target 
per ods pg # of 150 (this was the recommended /osd pg number as far as I 
recall). 

I used the RedHat ceph pg per pool calculator [1] with the following 
settings: 
PoolyType: EC 
K: 6 
M: 3 
OSD #: 220 
%Data: 100.00 
Target PGs per OSD: 150 

This resulted into a suggested PG count of 4096, which I used to create 
that pool. 
(BTW: I get the same using the ceph.io pgcalc [2], setting size to 9 
(=EC 6+3)) 

# ceph osd df shows that I have now 162-169 pgs / osd 

Now I enabled the pg_autoscaler and set autoscale_mode to "warn", using 
the following commands: 

# ceph mgr module  enable pg_autoscaler 
# ceph osd pool set hdd-ec-data-pool pg_autoscale_mode warn 

when i set the target_ratio_size to 1, using: 

# ceph osd pool set hdd-ec-data-pool target_size_ratio 1 

I get the warning: 
1 pools have too few placement groups 

The autoscale status shows that pg number in the EC pool would get 
scaled up to 16384 pgs: 

# ceph osd pool autoscale-status 
POOL                    SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET 
RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE 
ssd-rep-metadata-pool 11146M              3.0       29805G 0.0011 
 1.0     64          4 off 
ssd-rep-data-pool     837.8G              3.0       29805G 0.0843 
 1.0   1024         64 off 
hdd-ec-data-pool      391.3T              1.5        1614T 0.3636 
1.0000  1.0   4096      16384 warn 

How does this relate the the calculated number of 4096 pgs for that EC 
pool? I mean 16384 is quite a jump from the original value. 
Is the autoscaler miscalculating something or is the pg calculator wrong? 

Can someone explain it? (My cluster is on ceph version 14.2.7) 

Thanks so much 
 Dietmar 

[1]: https://access.redhat.com/labs/cephpgc/ 
[2]: https://ceph.io/pgcalc/ 

-- 
_________________________________________ 
D i e t m a r  R i e d e r, Mag.Dr. 
Innsbruck Medical University 
Biocenter - Institute of Bioinformatics 
Innrain 80, 6020 Innsbruck 
Email: mailto:dietmar.rieder@xxxxxxxxxxx 
Web: http://www.icbi.at 
_______________________________________________ 
ceph-users mailing list -- mailto:ceph-users@xxxxxxx 
To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx