Re: luminous: HEALTH_ERR full ratio(s) out of order

David Turner <drakonstein@xxxxxxxxx> · Wed, 10 Jan 2018 15:32:07 +0000

Why oh why would you run with such lean settings? You very well might not be able to recover your cluster if something happened while you were at 94% full without even a nearfull warning on anything. Nearfull should at least be brought down as it's just a warning in ceph's output to tell you to get more storage in before it's too late. If you wait until your disks are 95% full before the alert pops up telling you to order new hardware... you'll never get it in time. And if you're monitoring to add more hardware at a lower percentage already... why not lower the nearfull anyway just for the extra reminder that you're filling up? Nearfull literally does nothing other than a health_warn state.
But what if you have hardware failures while your cluster is full? What is likely to happen with these settings is that your OSDs all get backfill_full and can't shift data to add the new storage. Maybe you're just testing these settings or this is a test cluster, but settings anywhere near these ratios are terrible for production.

On Wed, Jan 10, 2018 at 10:15 AM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
Good to know. I don't think this should trigger HEALTH_ERR though, but HEALTH_WARN makes sense.It makes sense to keep the backfillfull_ratio greater than nearfull_ratio as one might need backfilling to avoid OSD getting full on reweight operations.

Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
Belo Horizonte - Brasil
IRC NICK - WebertRLZ

On Wed, Jan 10, 2018 at 12:11 PM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
Hello,

since upgrading to luminous i get the following error:

HEALTH_ERR full ratio(s) out of order

OSD_OUT_OF_ORDER_FULL full ratio(s) out of order

    backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased

but ceph.conf has:

        mon_osd_full_ratio = .97

        mon_osd_nearfull_ratio = .95

        mon_osd_backfillfull_ratio = .96

        osd_backfill_full_ratio = .96

        osd_failsafe_full_ratio = .98

Any ideas?  i already restarted:

* all osds

* all mons

* all mgrs

Greets,

Stefan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com