Re: Odd auto-scaler warnings about too few/many PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you ask me or Joachim, we'll tell you to disable autoscaler. ;-) It doesn't seem mature enough yet, especially with many pools. There have been multiple threads in the past discussing this topic, I'd suggest to leave it disabled. Or you could help improving it, maybe create a tracker issue if there isn't already an open one.

Zitat von Torkil Svensgaard <torkil@xxxxxxxx>:

Hi

A few years ago we were really strapped for space so we tweaked pg_num for some pools to ensure all pgs were as to close to the same size as possible while stile observing the power of 2 rule, in order to get the most mileage space wise. We set the auto-scaler to off for the tweaked pools to get rid of the warnings.

We now have a lot more free space so I flipped the auto-scaler to warn for all pools and set the bulk flag for the pools expected to be data pools, leading to this:

"
[WRN] POOL_TOO_FEW_PGS: 4 pools have too few placement groups
    Pool rbd has 512 placement groups, should have 2048
    Pool rbd_internal has 1024 placement groups, should have 2048
    Pool cephfs.nvme.data has 32 placement groups, should have 4096
    Pool cephfs.ssd.data has 32 placement groups, should have 1024
[WRN] POOL_TOO_MANY_PGS: 4 pools have too many placement groups
    Pool libvirt has 256 placement groups, should have 32
    Pool cephfs.cephfs.data has 512 placement groups, should have 32
    Pool rbd_ec_data has 4096 placement groups, should have 1024
    Pool cephfs.hdd.data has 2048 placement groups, should have 1024
"

That's a lot of warnings *ponder*

"
# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK libvirt 2567G 3.0 3031T 0.0025 1.0 256 warn False .mgr 807.5M 2.0 6520G 0.0002 1.0 1 warn False rbd_ec 9168k 3.0 6520G 0.0000 1.0 32 warn False nvme 31708G 2.0 209.5T 0.2955 1.0 2048 warn False .nfs 36864 3.0 6520G 0.0000 1.0 32 warn False cephfs.cephfs.meta 24914M 3.0 6520G 0.0112 4.0 32 warn False cephfs.cephfs.data 16384 3.0 6520G 0.0000 1.0 512 warn False rbd.ssd.data 798.1G 2.25 6520G 0.2754 1.0 64 warn False rbd_ec_data 609.2T 1.5 3031T 0.3014 1.0 4096 warn True rbd 68170G 3.0 3031T 0.0659 1.0 512 warn True rbd_internal 69553G 3.0 3031T 0.0672 1.0 1024 warn True cephfs.nvme.data 0 2.0 209.5T 0.0000 1.0 32 warn True cephfs.ssd.data 68609M 2.0 6520G 0.0206 1.0 32 warn True cephfs.hdd.data 111.0T 2.25 3031T 0.0824 1.0 2048 warn True
"

"
# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    3.0 PiB  1.3 PiB  1.6 PiB   1.6 PiB      54.69
nvme   210 TiB  146 TiB   63 TiB    63 TiB      30.21
ssd    6.4 TiB  4.0 TiB  2.4 TiB   2.4 TiB      37.69
TOTAL  3.2 PiB  1.5 PiB  1.7 PiB   1.7 PiB      53.07

--- POOLS ---
POOL                ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
rbd                  4   512   80 TiB   21.35M  200 TiB  19.31    278 TiB
libvirt              5   256  3.0 TiB  810.89k  7.5 TiB   0.89    278 TiB
rbd_internal         6  1024   86 TiB   28.22M  204 TiB  19.62    278 TiB
.mgr                 8     1  4.3 GiB    1.06k  1.6 GiB   0.07    1.0 TiB
rbd_ec              10    32   55 MiB       25   27 MiB      0    708 GiB
rbd_ec_data         11  4096  683 TiB  180.52M  914 TiB  52.26    556 TiB
nvme                23  2048   46 TiB   25.18M   62 TiB  31.62     67 TiB
.nfs                25    32  4.6 KiB       10  108 KiB      0    708 GiB
cephfs.cephfs.meta  31    32   25 GiB    1.66M   73 GiB   3.32    708 GiB
cephfs.cephfs.data  32   679    489 B   40.41M   48 KiB      0    708 GiB
cephfs.nvme.data    34    32      0 B        0      0 B      0     67 TiB
cephfs.ssd.data     35    32   77 GiB  425.03k  134 GiB   5.94    1.0 TiB
cephfs.hdd.data     37  2048  121 TiB   68.42M  250 TiB  23.03    371 TiB
rbd.ssd.data        38    64  934 GiB  239.94k  1.8 TiB  45.82    944 GiB
"

The most weird one:

Pool rbd_ec_data stores 683TB in 4096 pgs -> warn should be 1024
Pool rbd_internal stores 86TB in 1024 pgs-> warn should be 2048

That makes no sense to me based on the amount of data stored. Is this a bug or what am I missing? Ceph version is 17.2.7.

Mvh.

Torkil
--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux