Re: osd down detection broken in jewel?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 12 Dec 2016 22:24:54 -0800

On Wed, Nov 30, 2016 at 8:31 AM, Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

    Yes. This parameter is used in the condition described there:
      http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/#osds-report-their-status
      and works. I think the default timeout of 900s is quiet a bit
      large.

      Also in the documentation is a other function wich checks the
      health of OSDs and report them down:
      http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/#osds-report-down-osds

      As far as I see in the sourcode this documentation is not valid
      anymore! 

      I found this commit ->
      https://github.com/ceph/ceph/commit/bcb8f362ec6ac47c4908118e7860dec7971d001f#diff-0a5db46a44ae9900e226289a810f10e8

      "mon_osd_min_down_reporters" now is
        the threshold how many "mon_osd_reporter_subtree_level"
        has to report a down OSD. in Hammer this was how many other OSDs
        had to report. And in Hammer there was also the parameter
        "mon_osd_min_down_reports" which sets how often a other OSD has
        to report a other OSD. In Jewel the parameter doesn't exists
        anymore. 

        With this "knowlege" I adjusted my configuration.  And will now
        test it.

        BTW:

        While reading the source code I may found a other bug. Can you
        confirm this?

        In the function "OSDMonitor::check_failure" in  
        src/mon/OSDMonitor.cc  the code which counts the
        "reporters_by_subtree" is in the if block "if
        (g_conf->mon_osd_adjust_heartbeat_grace) {".  So if I disable

      adjust_heartbeat_grace the
        reporters_by_subtree functionality will not work at all.

Yes, I think you're correct and that's a (fairly nasty, to somebody someday)  bug. Can you create a ticket at tracker.ceph.com? :)
-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com