Yes. This parameter is used in the condition described there: http://docs.ceph.com/docs/
jewel/rados/configuration/mon- and works. I think the default timeout of 900s is quiet a bit large.osd-interaction/#osds-report- their-status
Also in the documentation is a other function wich checks the health of OSDs and report them down: http://docs.ceph.com/docs/jewel/rados/configuration/mon- osd-interaction/#osds-report- down-osds
As far as I see in the sourcode this documentation is not valid anymore!
I found this commit -> https://github.com/ceph/ceph/commit/ bcb8f362ec6ac47c4908118e7860de c7971d001f#diff- 0a5db46a44ae9900e226289a810f10 e8
"mon_osd_min_down_reporters" now is the threshold how many "mon_osd_reporter_subtree_level" has to report a down OSD. in Hammer this was how many other OSDs had to report. And in Hammer there was also the parameter "mon_osd_min_down_reports" which sets how often a other OSD has to report a other OSD. In Jewel the parameter doesn't exists anymore.
With this "knowlege" I adjusted my configuration. And will now test it.
BTW:
While reading the source code I may found a other bug. Can you confirm this?
In the function "OSDMonitor::check_failure" in src/mon/OSDMonitor.cc the code which counts the "reporters_by_subtree" is in the if block "if (g_conf->mon_osd_adjust_heartbeat_grace) {". So if I disable
adjust_heartbeat_grace the reporters_by_subtree functionality will not work at all.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com