Re: handling fs errors

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 22 Jan 2013 09:59:01 -0800



On Tuesday, January 22, 2013 at 5:12 AM, Wido den Hollander wrote:
> On 01/22/2013 07:12 AM, Yehuda Sadeh wrote:
> > On Mon, Jan 21, 2013 at 10:05 PM, Sage Weil <sage@xxxxxxxxxxx (mailto:sage@xxxxxxxxxxx)> wrote:
> > > We observed an interesting situation over the weekend. The XFS volume
> > > ceph-osd locked up (hung in xfs_ilock) for somewhere between 2 and 4
> > > minutes. After 3 minutes (180s), ceph-osd gave up waiting and committed
> > > suicide. XFS seemed to unwedge itself a bit after that, as the daemon was
> > > able to restart and continue.
> > > 
> > > The problem is that during that 180s the OSD was claiming to be alive but
> > > not able to do any IO. That heartbeat check is meant as a sanity check
> > > against a wedged kernel, but waiting so long meant that the ceph-osd
> > > wasn't failed by the cluster quickly enough and client IO stalled.
> > > 
> > > We could simply change that timeout to something close to the heartbeat
> > > interval (currently default is 20s). That will make ceph-osd much more
> > > sensitive to fs stalls that may be transient (high load, whatever).
> > > 
> > > Another option would be to make the osd heartbeat replies conditional on
> > > whether the internal heartbeat is healthy. Then the heartbeat warnings
> > > could start at 10-20s, ping replies would pause, but the suicide could
> > > still be 180s out. If the stall is short-lived, pings will continue, the
> > > osd will mark itself back up (if it was marked down) and continue.
> > > 
> > > Having written that out, the last option sounds like the obvious choice.
> > > Any other thoughts?
> > 
> > 
> > 
> > Another option would be to have the osd reply to the ping with some
> > health description.
> 
> 
> 
> Looking to the future with more monitoring that might be a good idea.
> 
> If an OSD simply stops sending heartbeats if the internal conditions 
> aren't met you don't know what's going on.
> 
> If the heartbeat would have metadata which tells: "I'm here, but not in 
> such a good shape" that could be reported back to the monitors.


I think we want to move towards more comprehensive pinging like this, but it's not something to do in haste. Pausing pings when the internal threads are disappearing sounds like a good simple step to make the reporting better match reality.
-Greg 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html