The OSDs will heartbeat each other, and report back to the monitors if any other OSD fails to respond. An OSD that fails to respond is effectively down, since it's not doing the things that it's supposed to do. It is possible for this process to cause problems. For example, I've had some OSDs on an overloaded node mark all of the other OSDs in the cluster down, because the overloaded node wasn't processing the heartbeat responses quickly enough. The solution there was to adjust "mon osd min down reporters" and "mon osd min down reports" so that a single node can't do that. You might see something like "epoch XXX wrongly marked me down", followed by the OSD rejoining the cluster. That's a sign that the OSD was overloaded, but not down. Once it was kicked out of the cluster, it caught up with the backlog and was able to rejoin. This shouldn't cause a chain reaction though. If you're not seeing that, then the OSD really is unresponsive, and needs to be restarted. The other OSDs will start replicating it's data automatically to make the cluster healthy again. This should not cause a chain reaction. If your cluster is overloaded (very close to running out of CPU, RAM, or Disk IO), then a failed OSD can cause a chain reaction as other OSDs pick up the failed OSD's workload. On Mon, Nov 3, 2014 at 11:12 PM, 飞 <duron800@xxxxxx> wrote: > hello, I am running ceph v0.87 for one week, at this week, > many osd have marking down, but I run "ps -ef | grep osd", I can see > the osd process, the osd not really down, then, I check osd log, > I see many logs like "osd.XX from dead osd.YY,marking down", > if the 0.87 will check other osd process ? if some osd is down, then the mon > will mark the current to down state ? > This will cause a chain reaction, leading to failure of the entire cluster, > it is a bug ? > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com