Yes, your understanding is correct. But the main mechanism by which OSDs are reported as down is that other OSDs report them as down with a much stricter timeout (20 seconds? 30 seconds? something like that). It's quite rare to hit the "mon osd report timeout" (the usual scenario here is a network partition) -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Jan 14, 2019 at 10:17 AM Eugen Block <eblock@xxxxxx> wrote: > > Hello list, > > I noticed my last post was displayed as a reply to a different thread, > so I re-send my question, please excuse the noise. > > There are two config options of mon/osd interaction that I don't fully > understand. Maybe one of you could clarify it for me. > > > mon osd report timeout > > - The grace period in seconds before declaring unresponsive Ceph OSD > > Daemons down. Default 900 > > > mon osd down out interval > > - The number of seconds Ceph waits before marking a Ceph OSD Daemon > > down and out if it doesn’t respond. Default 600 > > I've seen the mon_osd_down_out_interval beeing hit plenty of times, > e.g. If I manually take down an OSD it will be marked out after 10 > minutes. But I can't quite remember seeing the 900 seconds timeout > happen. When exactly will the mon_osd_report_timeout kick in? Does > this mean that if for some reason one OSD is unresponsive the MON will > mark it down after 15 minutes, then wait another 10 minutes until it > is marked out so the recovery can start? > > I'd appreciate any insight! > > Regards, > Eugen > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com