Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2011-03-01 at 17:53 -0700, Sage Weil wrote:
> Hi Jim,
> 
> We've fixed a few different bugs over the last week that were causing 
> heartbeat issues. 

Great!

>  Nothing that explains why we would see the hang that 
> you did, but other problems that caused the same 'wrongly marked me down' 
> issue.  Are you still seeing this problem with the latest 'next' and/or 
> 'master' branch?

I've been trying to isolate this on the stable branch
since my last posting - I can still reproduce at will
with my 96 osd test, but I haven't made much progress
at tracking down what is going wrong.

> 
> Also, if you don't mind reproducing, can you post a larger segment of the 
> log? 

Sure.  I've got some extra debug printing going in
my tree - the most interesting is a patch to log
queue, operation, and total elapsed times in
dispatch_entry() - it makes is really easy to
find when things go wrong.

I'll try to reproduce with master and post logs.
Is it OK for me to add my extra debug patches for
that?  I'll post them with the logs if so.

>  The really interesting question is what the heartbeat thread 
> (heartbeat_entry()) is doing during this period that tick() is blocked up, 
> since that's the thread that's responsible for sending the ping messages 
> to peer OSDs.

One of the things I am seeing is handle_osd_ping()
getting stalled, but I haven't been able to track
down why.

I'll see if I see the same signature with master,
and post logs.

-- Jim

> 
> Thanks!
> sage
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux