Re: OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

David Turner <drakonstein@xxxxxxxxx> · Thu, 14 Sep 2017 19:57:30 +0000

The warning you are seeing is because those settings are out of order and it's showing you which ones are greater than the ones they should be.  backfillfull_ratio is supposed to be higher than nearfull_ratio and osd_failsafe_full_ratio is supposed to be higher than full_ratio.  nearfull_ratio is a warning that shows up in your ceph status, but doesn't prevent anything from happening; backfillfull_ratio prevents backfilling from happening; and full_ratio prevents any IO from happening at all.

That is the answer to your question, but below is addressing the ridiculous values you are trying to set those to.

Why are you using such high ratios?  By default 5% of the disk is reserved by root for root and nobody but root.  I think that can be adjusted when you create the filesystem, but I am unaware if ceph-deploy does that or not.  But if that is the setting and if you're running your OSDs as user ceph (Jewel or later), then they will cap out at 95% full and the OS will fail to write to the OSD disk.
(assuming you set your ratios in the proper order) You are leaving yourself no room for your cluster to recover from any sort of down osds or failed osds.  I don't know what disks you're using, but I don't know of any that are guaranteed not to fail.  If your disks can't perform any backfilling, then you can't recover from anything... including just restarting an osd daemon or a node...  Based on 97% nearfull being your setting... you're giving yourself a 2% warning period to add more storage before your cluster is incapable of receiving reads or writes.  BUT you also set your cluster to not be able to backfill anything if the OSD is over 98% full.  Those settings pretty much guarantee that you will be 100% stuck and unable to even add more storage to your cluster if you wait until your nearfull_ratio is triggered.
I'm just going to say it... DON'T RUN WITH THESE SETTINGS EVER.  DON'T EVEN COME CLOSE TO THESE SETTINGS, THEY ARE TERRIBLE!!!

90% full_ratio is good (95% is the default) because it is a setting you can change and if you get into a situation where you need to recover your cluster and your cluster is full because of a failed node or anything, then you can change the full_ratio and have a chance to still recover your cluster.

80% nearfull_ratio is good (85% is the default) because it gives you 10% usable disk space for you to add more storage to your cluster or clean up cruft in your cluster that you don't need.  If it takes you a long time to get new hardware or find things to delete in your cluster, consider a lower number for this warning.

85% backfillfull_ratio is good (90% is the default) because of the same reason as full_ratio.  You can increase it if you need to for a critical recovery.  But with these setting a backfilling operation won't bring you too close to your full_ratio that you are in a high danger of blocking all IO to your cluster.

Even if you stick with the defaults you're in a good enough situation where you will be most likely able to recover from most failures in your cluster.  But don't push them up unless you are in the middle of a catastrophic failure and you're doing it specifically to recover after you have your game-plan resolution in place.

On Thu, Sep 14, 2017 at 10:03 AM Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:
On 14. sep. 2017 11:58, dE . wrote:

> Hi,

>      I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL

> health error, even though it appears that it is in order --

>

> full_ratio 0.99

> backfillfull_ratio 0.97

> nearfull_ratio 0.98

>

> These don't seem like a mistake to me but ceph is complaining --

> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order

>      backfillfull_ratio (0.97) < nearfull_ratio (0.98), increased

>      osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

post output from

ceph osd df

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com