On Wed, Mar 11, 2015 at 8:40 AM, Artem Savinov <asavinov@xxxxxxxx> wrote: > hello. > ceph transfers osd node in the down status by default , after receiving 3 > reports about disabled nodes. Reports are sent per "osd heartbeat grace" > seconds, but the settings of "mon_osd_adjust_heartbeat_gratse = true, > mon_osd_adjust_down_out_interval = true" timeout to transfer nodes in down > status may vary. Tell me please: what algorithm enables changes timeout for > the transfer nodes occur in down/out status and which parameters are > affected? > thanks. The monitors keep track of which detected failures are incorrect (based on reports from the marked-down/out OSDs) and build up an expectation about how often the failures are correct based on an exponential backoff of the data points. You can look at the code in OSDMonitor.cc if you're interested, but basically they apply that expectation to modify the down interval and the down-out interval to a value large enough that they believe the OSD is really down (assuming these config options are set). It's not terribly interesting. :) -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com