Re: cosd multi-second stalls cause "wrongly marked me down"

"Jim Schutt" <jaschut@xxxxxxxxxx> · Wed, 23 Feb 2011 10:52:18 -0700

On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Hi Sage,
> > 
> > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > shouldn't affect anything.  We may have missed something.. do you have a 
> > > > log showing this in action?
> > > 
> > > Obviously yes, looking at your original email.  :)  At the beginning of 
> > > each log line we include a thread id.  What would be really helpful would 
> > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > are blocking, either based on the existing output, or by adding additional 
> > > dout lines at interesting points in time.
> > 
> > I'll take a deeper look at my existing logs with
> > that in mind; let me know if you'd like me to
> > send you some.
> > 
> > I have also been looking at map_lock, as it seems
> > to be shared between the heartbeat and map update
> > threads.
> > 
> > Would instrumenting acquiring/releasing that lock
> > be helpful?  Is there some other lock that may
> > be more fruitful to instrument?  I can reproduce 
> > pretty reliably, so adding instrumentation is 
> > no problem.
> 
> The heartbeat thread is doing a map_lock.try_get_read() because it 
> frequently is held by another thread, so that shouldn't ever block. 
> 
> The possibilities I see are:
>  - peer_stat_lock
>  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> although that code should only trigger with a single osd.  :/
>  - heartbeat_lock itself could be held by another thread; i'd instrument 
> all locks/unlocks there, along with the wakeup in heartbeat().

I think the culprit is osd_lock.

The OSD tick timer needs to acquire the osd_lock mutex to
wake up, right?  So if osd_lock is ever held for an excessive 
time, the tick will be delayed.

I used debug lockdep = 20 on my osds, and saw this via
egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"

2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick

I've attached a full log of the 10 second period that osd_lock
was held.

Is it possible to separate out the parts of tick() that
need to covered by osd_lock and do them somewhere else, 
so that a new lock, say tick_lock, can be used to
protect the tick timer?

-- Jim

> 
> Thanks for looking at this!
> sage
> 
> 
> 
> > 
> > -- Jim
> > 
> > > 
> > > sage
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
Attachment:
osd.70.log.stall.bz2

Description: application/bzip