On Mon, Jan 21, 2013 at 10:05 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > We observed an interesting situation over the weekend. The XFS volume > ceph-osd locked up (hung in xfs_ilock) for somewhere between 2 and 4 > minutes. After 3 minutes (180s), ceph-osd gave up waiting and committed > suicide. XFS seemed to unwedge itself a bit after that, as the daemon was > able to restart and continue. > > The problem is that during that 180s the OSD was claiming to be alive but > not able to do any IO. That heartbeat check is meant as a sanity check > against a wedged kernel, but waiting so long meant that the ceph-osd > wasn't failed by the cluster quickly enough and client IO stalled. > > We could simply change that timeout to something close to the heartbeat > interval (currently default is 20s). That will make ceph-osd much more > sensitive to fs stalls that may be transient (high load, whatever). > > Another option would be to make the osd heartbeat replies conditional on > whether the internal heartbeat is healthy. Then the heartbeat warnings > could start at 10-20s, ping replies would pause, but the suicide could > still be 180s out. If the stall is short-lived, pings will continue, the > osd will mark itself back up (if it was marked down) and continue. > > Having written that out, the last option sounds like the obvious choice. > Any other thoughts? > Another option would be to have the osd reply to the ping with some health description. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html