Re: Client reconnect failing: reader gets bad tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Sage,

Sage Weil wrote:
On Wed, 4 May 2011, Jim Schutt wrote:
Hi,

I'm seeing clients having trouble reconnecting after timed-out
requests.  When they get in this state, sometimes they manage
to reconnect after several attempts; sometimes they never seem
to be able to reconnect.

Hmm, the interesting line is

2011-05-04 16:00:59.710971 7f15d6948940 -- 172.17.40.30:6806/12583 >>
172.17.40.49:0/302440129 pipe(0x213fa000 sd=91 pgs=430 cs=1 l=1).reader bad
tag 0

That _should_ mean the server side (osd) closes out the connection immediately, which should generate a disconnect error on the client and an immediate reconnect. So it's strange that you're also seeing timeouts.

Of course, we should be getting bad tags anyway, so something else is clearly wrong and may be contributing to both problems. How easy is this to reproduce? It's right after a fresh connection, so the number of possibly offending code paths is pretty small, at least!

There is client side debugging to turn on, but it's very chatty. Maybe you can just enable a few key lines, like the connect handshake ones, and any point where we queue/send a tag. It's a bit tedious to enable the individual dout lines in messenger.c, sadly, but unless you have a very fast netconsole or something that's probably the only way to go...

Here's some logs of a client-server hanging interaction.

My dd started on the client at 14:38:22.

The first bad tag can be seen in the osd6 log at 14:39:40.655544.

AFAICS, the client had written a stripe into its socket,
and the OSD got as far as reading the msg tag and header
when the client gave up the the message, closed the socket,
and reconnected.  The OSD got a bad tag on the new pipe.

After that the client continued to retry the send, but
for many retries it always sent a bad tag.  But, it seems
to do this without closing/opening the socket.

Then, the client does close/open the socket, and a valid
msg tag is sent, and things work fine.

FWIW, I think the client-side messenger isn't doing a
good job distinguishing a busy OSD from a dead OSD.

-- Jim


sage



Attachment: client.full.log.bz2
Description: application/bzip

Attachment: client.log.bz2
Description: application/bzip

Attachment: server.log.bz2
Description: application/bzip


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux