Re: How does monitor know OSD is dead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Normally in the case of a restart then somebody who used to have a
> connection to the OSD would still be running and flag it as dead. But
> if *all* the daemons in the cluster lose their soft state, that can't
> happen.

OK, thanks.  I guess that explains it.  But that's a pretty serious design
flaw, isn't it?  What I experienced is a pretty common failure mode: a power
outage caused the entire cluster to die simultaneously, then when power came
back, some OSDs didn't (the most common time for a server to fail is at
startup).

I wonder if I could close this gap with additional monitoring of my own.  I
could have a cluster bringup protocol that detects OSD processes that aren't
running after a while and mark those OSDs down.  It would be cleaner, though,
if I could just find out from the monitor what OSDs are in the map but not
connected to the monitor cluster.  Is that possible?

A related question: If I mark an OSD down administratively, does it stay down
until I give a command to mark it back up, or will the monitor detect signs of
life and declare it up again on its own?

-- 
Bryan Henderson                                   San Jose, California
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux