I'm still getting crashes with tapdisk rbd. Most of the time it crashes gdb if I try. When I do get something, the crashing thread is always segfaulting in pthread_cond_wait and the stack is always corrupt: (gdb) bt #0 0x00007faae20c52d7 in pthread_cond_wait@@GLIBC_2.3.2 () from remote:/lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00c1c435c10e782c in ?? () #2 0xe0bc294e52000010 in ?? () #3 0x08481380b00400fa in ?? () #4 0x3326aab400000000 in ?? () #5 0x0000000008001e00 in ?? () #6 0x000004043326aab4 in ?? () #7 0x7aef0100040595ef in ?? () When I examine the memory on the stack I get like: 0x7faae3cc7c10: 0x00 0x00 0x00 0x00 0xb4 0xaa 0x26 0x32 0x7faae3cc7c18: 0x00 0x1e 0x00 0x08 0x00 0x00 0x00 0x00 0x7faae3cc7c20: 0xb4 0xaa 0x26 0x32 0x04 0x04 0x00 0x00 0x7faae3cc7c28: 0xef 0x95 0x05 0x04 0x00 0x01 0xef 0x79 0x7faae3cc7c30: 0x06 0x04 0x00 0x00 0x00 0x01 0x2b 0xf8 0x7faae3cc7c38: 0x2c 0x78 0x0e 0xc1 0x35 0xc4 0xc1 0x00 0x7faae3cc7c40: 0x10 0x00 0x00 0x52 0x4e 0x29 0xbc 0xe0 0x7faae3cc7c48: 0xfa 0x00 0x04 0xb0 0x80 0x13 0x48 0x08 0x7faae3cc7c50: 0x00 0x00 0x00 0x00 0xb4 0xaa 0x26 0x33 0x7faae3cc7c58: 0x00 0x1e 0x00 0x08 0x00 0x00 0x00 0x00 0x7faae3cc7c60: 0xb4 0xaa 0x26 0x33 0x04 0x04 0x00 0x00 0x7faae3cc7c68: 0xef 0x95 0x05 0x04 0x00 0x01 0xef 0x7a 0x7faae3cc7c70: 0x06 0x04 0x00 0x00 0x00 0x01 0x2c 0x38 0x7faae3cc7c78: 0x2c 0xb8 0x0e 0xc1 0x35 0xc5 0xc1 0x00 0x7faae3cc7c80: 0x10 0x00 0x00 0x52 0x4e 0x29 0xbc 0xe0 0x7faae3cc7c88: 0xfa 0x00 0x04 0xb0 0x80 0x13 0x5c 0x08 And I see very similar byte patterns in a tcpdump taken at the time of the crash, so I'm wondering if data read from or to be written to the network is overflowing a buffer somewhere and corrupting the stack. Does ceph use a magic start of message number or something that I could identify? Thanks James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html