Re: OSDs marked OUT wrongly after monitor failover

Ridge Chen <ridge.chen@xxxxxxxxx> · Thu, 27 Oct 2016 10:48:07 +0800



I also raised a bug report: http://tracker.ceph.com/issues/17719

2016-10-27 10:36 GMT+08:00 Ridge Chen <ridge.chen@xxxxxxxxx>:
> Hi Experts,
>
> Recently we find an issue with our ceph cluster, the version is 0.94.6.
>
> We want to add additional RAM to the ceph nodes, so we need to stop
> the ceph service on the nodes first. When we did that on the first
> node, we found the OSDs on that node marked OUT and backfill started
> (DOWN is expected in this case). The first node is somewhat special
> that it is also the location of the leader monitor.
>
> Then checked the monitor log and found the following:
>
> cluster [INF] osd.0 out (down for 3375169.141844)
>
> Looks like the monitor (who just become leader) has wrong
> "down_pending_out" records and computes out a  a very long DOWN time ,
> finally decides to mark them OUT.
>
> After researching the related code, the reason could be that:
>
> 1. "down_pending_out" is set a month ago for those OSDs because of a
> network issue.
> 2. The down OSDs up and join the cluster again. "down_pending_out" is
> cleared in the "OSDMonitor::tick()" method. But only happened on
> leader monitor.
> 3. When we stop the ceph service on the first node. The monitor group
> failover. The new leader monitor will recognize the OSDs kept in DOWN
> status for a a very long time, and mark them OUT wrongly.
>
>
> What do you think of this?
>
> Regards
> Ridge
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html