Re: cosd multi-second stalls cause "wrongly marked me down"

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 11 Mar 2011 13:37:20 -0800 (PST)

On Fri, 11 Mar 2011, Jim Schutt wrote:
> 
> On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> > On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > So none of those were osd_ping messages.
> > > 
> > > But, I still had lots of delayed acks.  Here's a couple more examples:
> > > 
> > > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > 
> > This bit has me confused:
> > 
> > > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> > 
> > It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> > debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> > put' lines, though, right?  
> 
> OK, with Greg's utime_t double fix, I git no hits on
> 
> # egrep -Hn --color -e "->put took" osd.*.log
> 
> 
> But, I still get lots of stalled :RefCountedObject::put delete:
> 
> # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
> 8911
> 
> # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
> 415

Hmm!  That does seem to point at the allocator, doesn't it.

Other threads are doing work during this long interval?  Including freeing 
memory, presumably, since basically everything uses the heap one way or 
another.  If it's the allocator, it's somehow affecting one thread only, 
which is pretty crazy.

Is it difficult for you to try this with tcmalloc?  That'll tell us 
something.

One other possibility would be to try to catch this "in the act" and send 
it a SIGABRT to get a core dump.  Then we can look in more detail at what 
this (and other) threads are up to.  I'm not sure how easy this is to 
catch on a particular node...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html