I also raised a bug report: http://tracker.ceph.com/issues/17719 2016-10-27 10:36 GMT+08:00 Ridge Chen <ridge.chen@xxxxxxxxx>: > Hi Experts, > > Recently we find an issue with our ceph cluster, the version is 0.94.6. > > We want to add additional RAM to the ceph nodes, so we need to stop > the ceph service on the nodes first. When we did that on the first > node, we found the OSDs on that node marked OUT and backfill started > (DOWN is expected in this case). The first node is somewhat special > that it is also the location of the leader monitor. > > Then checked the monitor log and found the following: > > cluster [INF] osd.0 out (down for 3375169.141844) > > Looks like the monitor (who just become leader) has wrong > "down_pending_out" records and computes out a a very long DOWN time , > finally decides to mark them OUT. > > After researching the related code, the reason could be that: > > 1. "down_pending_out" is set a month ago for those OSDs because of a > network issue. > 2. The down OSDs up and join the cluster again. "down_pending_out" is > cleared in the "OSDMonitor::tick()" method. But only happened on > leader monitor. > 3. When we stop the ceph service on the first node. The monitor group > failover. The new leader monitor will recognize the OSDs kept in DOWN > status for a a very long time, and mark them OUT wrongly. > > > What do you think of this? > > Regards > Ridge -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html