On Thu, 16 Aug 2018, Aleksei Gutikov wrote: > Hi > > I know two possible events triggering OSD down: > - other OSDs reported about failed peering > - MON not received heartbeat or report > > All this healthchecks seems to evaluate only networking capabilities of OSD. > Is there any implemented ways to trigger OSD down if object store stucks? > Does OSD allowed to mark down itself? The OSD has various internal checks that will cause it to exit (and thus be marked down) if there are problems. Those include checks for EIO and internal heartbeats that will trigger an OSD suicide if critical threads gets stuck without making progress. filestore_op_thread_suicide_timeout = 180 filestore_op_thread_timeout = 60 osd_command_thread_suicide_timeout = 900 osd_command_thread_timeout = 600 osd_op_thread_suicide_timeout = 150 osd_op_thread_timeout = 15 sage