Nautilus: pg_autoscaler causes mon slow ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

we upgraded our production cluster just recently to version:

ceph version 14.2.3-349-g7b1552ea82 (7b1552ea827cf5167b6edbba96dd1c4a9dc16937) nautilus (stable)

We then activated pg_autoscaler for two pools that had a bad pg_num and the result is satisfying. However, after the rebalance finished the cluster became laggy. We noticed that two out of three MONs had a much higher CPU usage than usual, according to `top` the MON processes consumed more than 100%. Restarting the MON services and disabling pg_autoscaler resolved the issue. I've read that the balancer module can cause a higher load on the MGR daemon, is this somehow related?


Another thing to mention is the confusing calculation of the autoscaler. After the pg numbers had been corrected we got the warning about overcommitted pools:

1 subtrees have overcommitted pool target_size_bytes
1 subtrees have overcommitted pool target_size_ratio

The images pool was responsible for that. The confusing part was that sometimes autoscale-status displayed the size of that pool with more than 14 TB:

ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE   images           14399G                3.0        33713G  1.2813                 1.0     128              on


And a couple of minutes later the pool suddenly only had around 4 TB of data:

ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE   images            4112G                3.0        33713G  0.3659                 1.0     128              on      


There seems to be some kind of inconsistency here. The actual used storage of this pool according to `ceph df` is:

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL images 1 4.1 TiB 1.01M 12 TiB 49.73 4.1 TiB

Has anyone experienced something similar? Are these known issues?

Regards,
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux