Re: luminous: HEALTH_ERR full ratio(s) out of order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Why oh why would you run with such lean settings? You very well might not be able to recover your cluster if something happened while you were at 94% full without even a nearfull warning on anything. Nearfull should at least be brought down as it's just a warning in ceph's output to tell you to get more storage in before it's too late. If you wait until your disks are 95% full before the alert pops up telling you to order new hardware... you'll never get it in time. And if you're monitoring to add more hardware at a lower percentage already... why not lower the nearfull anyway just for the extra reminder that you're filling up? Nearfull literally does nothing other than a health_warn state.

But what if you have hardware failures while your cluster is full? What is likely to happen with these settings is that your OSDs all get backfill_full and can't shift data to add the new storage. Maybe you're just testing these settings or this is a test cluster, but settings anywhere near these ratios are terrible for production.

On Wed, Jan 10, 2018 at 10:15 AM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
Good to know. I don't think this should trigger HEALTH_ERR though, but HEALTH_WARN makes sense.
It makes sense to keep the backfillfull_ratio greater than nearfull_ratio as one might need backfilling to avoid OSD getting full on reweight operations.


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
Belo Horizonte - Brasil
IRC NICK - WebertRLZ

On Wed, Jan 10, 2018 at 12:11 PM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
Hello,

since upgrading to luminous i get the following error:

HEALTH_ERR full ratio(s) out of order
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
    backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased

but ceph.conf has:

        mon_osd_full_ratio = .97
        mon_osd_nearfull_ratio = .95
        mon_osd_backfillfull_ratio = .96
        osd_backfill_full_ratio = .96
        osd_failsafe_full_ratio = .98

Any ideas?  i already restarted:
* all osds
* all mons
* all mgrs

Greets,
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux