Re: cosd multi-second stalls cause "wrongly marked me down"

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 31 Mar 2011 11:41:23 -0700 (PDT)

On Thu, 31 Mar 2011, Jim Schutt wrote:
> Sage Weil wrote:
> > On Thu, 31 Mar 2011, Jim Schutt wrote:
> > > Jim Schutt wrote:
> > > > Sage Weil wrote:
> > > > > On Thu, 31 Mar 2011, Jim Schutt wrote:
> > > > > > > I was actually suggesting we try to make it core dump inside the
> > > > > > > "delete
> > > > > > > this" and watching for a stall in progress and then sending
> > > > > > > SIGABRT to
> > > > > > > dump
> > > > > > > core in the act.  That way we verify it really is in the allocator
> > > > > > > (and
> > > > > > > maybe even see where).  That's a bit harder to set up, though!  
> > > > > > Right, I couldn't think of how to automate that stall detection
> > > > > > during the stall, rather than after.  At least, I couldn't
> > > > > > think of how to do it without incurring possibly excessive
> > > > > > overhead, say by starting a timer on every "delete this".
> > > > > Yeah.  I wonder if dumping core on a cosd right when it gets marked
> > > > > down
> > > > > would do the trick?  That should catch it ~20 seconds or whatever in
> > > > > the
> > > > > stall.  By watching for the "osdfoo marked down" messages from ceph
> > > > > -w?
> > > > What about making Cond::Wait() use pthread_cond_timedwait()
> > > > with a suitable timeout value, say 10 seconds, and asserting
> > > > on timeout?  Do you think there would be many legitimate 10
> > > > second delays in OSD processing?
> > > > 
> > > Or, I could make a Cond::WaitIntervalOrAbort(), and
> > > use it just on the pipe lock, since that's the source
> > > of the trouble.  Sound useful?
> > 
> > Yeah that sounds like the way to go.. then you can hand pick the site(s)
> > that is/are waiting a long time in this case and switch those to
> > WaitIntervalOrAbort?  Hopefully the cond timer will go off despite whatever
> > badness is going on in delete this...
> 
> Actually, it occurs to me Wait() isn't what I'm after:
> that is used to wait some unknown time for some event.
> 
> I think instead I need to use TryLock() on the pipe_lock
> in submit_message(), in a loop with a suitable sleep,
> say 100us, and assert when it takes too long to acquire
> the lock.
> 
> So, maybe add a Mutex::LockOrAbort(), and use it in
> submit_message()?
> 
> submit_message() is intended to return immediately, no?
> And the issue is caused by heartbeat() being unable to
> queue messages, so this sounds to me to be a useful
> test.
> 
> Does that seem to have low enough overhead to
> be useful?

Yeah, that sounds right!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html