How does monitor know OSD is dead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What does it take for a monitor to consider an OSD down which has been dead as
a doornail since the cluster started?

A couple of times, I have seen 'ceph status' report an OSD was up, when it was
quite dead.  Recently, a couple of OSDs were on machines that failed to boot
up after a power failure.  The rest of the Ceph cluster came up, though, and
reported all OSDs up and in.  I/Os stalled, probably because they were waiting
for the dead OSDs to come back.

I waited 15 minutes, because the manual says if the monitor doesn't hear a
heartbeat from an OSD in that long (default value of mon_osd_report_timeout),
it marks it down.  But it didn't.  I did "osd down" commands for the dead OSDs
and the status changed to down and I/O started working.

And wouldn't even 15 minutes of grace be unacceptable if it means I/Os have to
wait that long before falling back to a redundant OSD?

-- 
Bryan Henderson                                   San Jose, California
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux