Re: Cluster Re-balancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Casper,

Thank you for the response, problem is solved now. After some searching, it turned out to be that after Luminous, setting  mon_osd_backfillfull_ratio and  mon_osd_nearfull_ratio do not take effect anymore. This is because these settings are being read from the OSD map and the commands "ceph osd set-nearfull-ratio" and "ceph osd set-backfillfull-ratio" are used to change them.

This was verified by running "ceph osd dunp|head" all ratios were still 0.92 and 0.95...etc. When Setting them to 0.85 the flags started to work normally and we were able to control our cluster in a better way.

Moreover, setting the backfillfull ratio lower than near full ration would show a HEALH_ERR out of order flags. Therefore, we set them to the same number for now and started reweighting to rebalance the cluster 

The backfillfull ones actually prevent data movement to them and data was moved to other OSDs with more free space. Nevertheless some PG got stuck and backfill_too_full was flagged. at the end those we reweighted and all restored to normal. Finally we set the backfullfull ratio to be higher than the nearfull ratio. END OF STORY.


Thanks






On Wed, Apr 18, 2018 at 11:20 AM, Caspar Smit <casparsmit@xxxxxxxxxxx> wrote:
Hi Monis,

The settings you mention do not prevent data movement to overloaded OSD's, they are a threshold when CEPH warns when an OSD is nearfull or backfillfull.
No expert on this but setting backfillfull lower then nearfull is not recommended, the nearfull state should be reached first in stead of backfillfull.

You can reweight the overloaded OSD's manually by issueing: ceph osd reweight osd.X 0.95          (the last value should be between 0 and 1, where 1 is the default and can be seen as 100%, setting this to 0.95 means to only use 95% of the OSD, to move more PGS of this OSD you can set the value lower to 0.9 or 0.85)

Kind regards,
Caspar


2018-04-18 9:07 GMT+02:00 Monis Monther <mmmm82@xxxxxxxxx>:
Hi,

We are running a cluster with ceph luminous 12.2.0. Some of the OSDs are getting full and we are running ceph osd reweight-by-utilization to re-balance the OSDs. We have also set 

mon_osd_backfillfull_ratio 0.8 (This is to prevent moving data to an overloaded OSD when re-weighting)
mon_osd_nearfull_ratio 0.85

However, reweight is worsening the problem by moving data from an 85% full OSD to an 84.7 full OSD instead of moving it to half empty OSD. This is causing the last to increase up to 85.6. Some OSDs have now reached 87% and 86%

Moreover, the cluster does not show any OSD as near full although some OSDs have passed 86% and is totaly ignoring the backfillfull setting by moving data to OSDs that are above 80%. 

Are the settings above wrong? what can we do to prevent moving data to overloaded OSDs

--
Best Regards
Monis

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Best Regards
Monis
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux