Re: osd down detection broken in jewel?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 30, 2016 at 8:31 AM, Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

Yes. This parameter is used in the condition described there: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/#osds-report-their-status and works. I think the default timeout of 900s is quiet a bit large.

Also in the documentation is a other function wich checks the health of OSDs and report them down: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/#osds-report-down-osds

As far as I see in the sourcode this documentation is not valid anymore!
I found this commit -> https://github.com/ceph/ceph/commit/bcb8f362ec6ac47c4908118e7860dec7971d001f#diff-0a5db46a44ae9900e226289a810f10e8

"mon_osd_min_down_reporters" now is the threshold how many "mon_osd_reporter_subtree_level" has to report a down OSD. in Hammer this was how many other OSDs had to report. And in Hammer there was also the parameter "mon_osd_min_down_reports" which sets how often a other OSD has to report a other OSD. In Jewel the parameter doesn't exists anymore.

With this "knowlege" I adjusted my configuration.  And will now test it.


BTW:
While reading the source code I may found a other bug. Can you confirm this?
In the function "OSDMonitor::check_failure" in   src/mon/OSDMonitor.cc  the code which counts the "reporters_by_subtree" is in the if block "if (g_conf->mon_osd_adjust_heartbeat_grace) {".  So if I disable

adjust_heartbeat_grace the reporters_by_subtree functionality will not work at all.


Yes, I think you're correct and that's a (fairly nasty, to somebody someday)  bug. Can you create a ticket at tracker.ceph.com? :)
-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux