Thanks JC & Greg, I've changed the "mon osd min down reporters" to 1. According to this: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/ the default is already 1, though. I don't remember the value
before I changed it everywhere, so I can't say for sure now. But I
think it was 2 despite what the docs say. Whatever. It's now 1
everywhere. Another somewhat weird thing I found was: When I check the values of an OSD(!) with "ceph daemon osd.0 config show | sort | grep mon_osd" I see an entry "mon osd min down reporters". I can even change it. But according to the docs, this is just a setting for monitors. Why does it appear there? Does it influence anything? If not: Is there a way to only show relevant config entries for a daemon? Then, when checking the doc page mentioned above and reading the descriptions of the multitude of config settings, I wonder: How can I properly estimate the time until my cluster works again? Since I get hung requests until the failed node is finally declared *down*, this time is obviously quite important for me. What exactly is the sequence of events when a node fails (i.e. someone accidentally hits the power off button). My (possibly totally wrong & dumb) idea: 1) osd0 fails/doesn't answer 2) osd1 pings osd0 every 6 seconds (
osd heartbeat interval). Thus, after 6 seconds max. osd1 notices
osd0 *could be* down. 3) After another 20 seconds (osd heartbeat grace), osd1 decides osd0 is definitely down. 4) Another 120 seconds might elapse ( osd mon report interval max) until osd1 reports the bad news to the monitor. 5) The monitor gets the information about failed osd0 and since "mon osd min down reporters" is 1, this single osd is sufficent for the monitor to believe the bad news that osd0 is unresponsive. 6) But since "mon osd min down reports" is 3, all the stuff up
until now has to happen 3 times in a row until the monitor finally
realizes osd0 is *really* unresponsive. 7) After another 900 seconds (mon osd report timeout) of waiting in hope of another news that osd0 is still/back alive, the monitor marks osd0 as down 8) After another 300 seconds (mon osd down out interval) the monitor marks osd0 as down+out
So, after my possibly very naive understanding, it takes
3*(6+20+120) + 900 + 300 seconds from the event "someone
accidentally hit the power off switch" to "osd0 is marked
down+out". Correct? I expect not. Which config variables did I misunderstand?
Thank you Ranjan
Am 29.09.2016 um 20:48 schrieb LOPEZ
Jean-Charles:
mon_osd_min_down_reporters by default set to 2 |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com