Re: Client reconnect failing: reader gets bad tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 May 2011, Jim Schutt wrote:
> Hi,
> 
> I'm seeing clients having trouble reconnecting after timed-out
> requests.  When they get in this state, sometimes they manage
> to reconnect after several attempts; sometimes they never seem
> to be able to reconnect.

Hmm, the interesting line is

> 2011-05-04 16:00:59.710971 7f15d6948940 -- 172.17.40.30:6806/12583 >>
> 172.17.40.49:0/302440129 pipe(0x213fa000 sd=91 pgs=430 cs=1 l=1).reader bad
> tag 0

That _should_ mean the server side (osd) closes out the connection 
immediately, which should generate a disconnect error on the client and an 
immediate reconnect.  So it's strange that you're also seeing timeouts.

Of course, we should be getting bad tags anyway, so something else is 
clearly wrong and may be contributing to both problems.  

How easy is this to reproduce?  It's right after a fresh connection, so 
the number of possibly offending code paths is pretty small, at least!

There is client side debugging to turn on, but it's very chatty.  Maybe 
you can just enable a few key lines, like the connect handshake ones, and 
any point where we queue/send a tag.  It's a bit tedious to enable 
the individual dout lines in messenger.c, sadly, but unless you have a 
very fast netconsole or something that's probably the only way to go...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux