Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2 Mar 2011, Jim Schutt wrote:
> On Tue, 2011-03-01 at 17:53 -0700, Sage Weil wrote:
> > Hi Jim,
> > 
> > We've fixed a few different bugs over the last week that were causing 
> > heartbeat issues. 
> 
> Great!
> 
> >  Nothing that explains why we would see the hang that 
> > you did, but other problems that caused the same 'wrongly marked me down' 
> > issue.  Are you still seeing this problem with the latest 'next' and/or 
> > 'master' branch?
> 
> I've been trying to isolate this on the stable branch
> since my last posting - I can still reproduce at will
> with my 96 osd test, but I haven't made much progress
> at tracking down what is going wrong.
> 
> > 
> > Also, if you don't mind reproducing, can you post a larger segment of the 
> > log? 
> 
> Sure.  I've got some extra debug printing going in
> my tree - the most interesting is a patch to log
> queue, operation, and total elapsed times in
> dispatch_entry() - it makes is really easy to
> find when things go wrong.
>
> I'll try to reproduce with master and post logs.
> Is it OK for me to add my extra debug patches for
> that?  I'll post them with the logs if so.

Absolutely.

> >  The really interesting question is what the heartbeat thread 
> > (heartbeat_entry()) is doing during this period that tick() is blocked up, 
> > since that's the thread that's responsible for sending the ping messages 
> > to peer OSDs.
> 
> One of the things I am seeing is handle_osd_ping()
> getting stalled, but I haven't been able to track
> down why.
> 
> I'll see if I see the same signature with master,
> and post logs.

Thanks!  Keep us posted.
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux