Re: osd sequence number mismatches and timeout's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 01, 2010 at 03:57:55PM -0700, Sage Weil wrote:
> Is there something in dmesg before the osd22 seq number errors pop up?  

Yup, you were quite right.  There was a bad crc that probably caused
the seq's to get out of sync.

Nov  1 10:12:50 bdio20 kernel: [233439.052725] ceph: osd22 10.138.138.13:6804 bad crc
Nov  1 10:12:51 bdio20 kernel: [233440.672738] ceph: skipping osd22 192.168.168.13:6804 seq 1, expected 2
Nov  1 10:12:51 bdio20 kernel: [233440.672958] ceph: skipping osd22 192.168.168.13:6804 seq 2, expected 3
Nov  1 10:12:51 bdio20 kernel: [233440.675705] ceph: skipping osd22 192.168.168.13:6804 seq 3, expected 4

> Something originally caused the seq's to get out of sync.  I suspect it 
> was a transient network error that made the TCP session drop and 
> reconnect, and it's not skipping already-received messages.  There was a 
> bug in the skip code (so they stayed out of sync and osd22 eventually 
> timed out).  I pushed a fix for that to the ceph-client.git master branch 
> (df9f86fa).

BTW, it looks like something may be unhappy?  I tried doing a clone of
ceph-client.git, and I'm getting a failure:

% git clone git://ceph.newdream.net/git/ceph-client.git ceph-client
Cloning into ceph-client...
fatal: I don't handle protocol '/usr/local/google/git'

I downloaded df9f86fa and will try it out.  Thanks for pushing out the
patch so quickly!

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux