On Mon, Nov 01, 2010 at 03:57:55PM -0700, Sage Weil wrote: > Is there something in dmesg before the osd22 seq number errors pop up? Yup, you were quite right. There was a bad crc that probably caused the seq's to get out of sync. Nov 1 10:12:50 bdio20 kernel: [233439.052725] ceph: osd22 10.138.138.13:6804 bad crc Nov 1 10:12:51 bdio20 kernel: [233440.672738] ceph: skipping osd22 192.168.168.13:6804 seq 1, expected 2 Nov 1 10:12:51 bdio20 kernel: [233440.672958] ceph: skipping osd22 192.168.168.13:6804 seq 2, expected 3 Nov 1 10:12:51 bdio20 kernel: [233440.675705] ceph: skipping osd22 192.168.168.13:6804 seq 3, expected 4 > Something originally caused the seq's to get out of sync. I suspect it > was a transient network error that made the TCP session drop and > reconnect, and it's not skipping already-received messages. There was a > bug in the skip code (so they stayed out of sync and osd22 eventually > timed out). I pushed a fix for that to the ceph-client.git master branch > (df9f86fa). BTW, it looks like something may be unhappy? I tried doing a clone of ceph-client.git, and I'm getting a failure: % git clone git://ceph.newdream.net/git/ceph-client.git ceph-client Cloning into ceph-client... fatal: I don't handle protocol '/usr/local/google/git' I downloaded df9f86fa and will try it out. Thanks for pushing out the patch so quickly! - Ted -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html