I think I tracked this bug down, the Oops is due to 'msg->bio_iter == NULL'. --- diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index f0993af..ac16f13 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -549,6 +549,10 @@ static void prepare_write_message(struct ceph_connection *con) } m = list_first_entry(&con->out_queue, struct ceph_msg, list_head); +#ifdef CONFIG_BLOCK + if (m->bio && m->bio_iter) + m->bio_iter = NULL; +#endif con->out_msg = m; /* put message on sent list */ On Thu, Apr 12, 2012 at 6:30 AM, Danny Kukawka <danny.kukawka@xxxxxxxxx> wrote: > Hi, > > we are currently testing CEPH with RBD on a cluster with 1GBit and > 10Gbit interfaces. While we see no kernel crashes with RBD if the > cluster runs on the 1GBit interfaces, we see very frequent kernel > crashes with the 10Gbit network while running tests with e.g. fio > against the RBDs. > > I've tested it with kernel v3.0 and also 3.3.0 (with the patches from > the 'for-linus' branch from ceph-client.git at git.kernel.org). > > With more client machines running tests the crashes occur even much > faster. The issue is fully reproducible here. > > Has anyone seen similar problems? See the backtrace below. > > Regards > > Danny > > PID: 10902 TASK: ffff88032a9a2080 CPU: 0 COMMAND: "kworker/0:0" > #0 [ffff8803235fd950] machine_kexec at ffffffff810265ee > #1 [ffff8803235fd9a0] crash_kexec at ffffffff810a3bda > #2 [ffff8803235fda70] oops_end at ffffffff81444688 > #3 [ffff8803235fda90] __bad_area_nosemaphore at ffffffff81032a35 > #4 [ffff8803235fdb50] do_page_fault at ffffffff81446d3e > #5 [ffff8803235fdc50] page_fault at ffffffff81443865 > [exception RIP: read_partial_message+816] > RIP: ffffffffa041e500 RSP: ffff8803235fdd00 RFLAGS: 00010246 > RAX: 0000000000000000 RBX: 00000000000009d7 RCX: 0000000000008000 > RDX: 0000000000000000 RSI: 00000000000009d7 RDI: ffffffff813c8d78 > RBP: ffff880328827030 R8: 00000000000009d7 R9: 0000000000004000 > R10: 0000000000000000 R11: ffffffff81205800 R12: 0000000000000000 > R13: 0000000000000069 R14: ffff88032a9bc780 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff8803235fdd38] thread_return at ffffffff81440e82 > #7 [ffff8803235fdd78] try_read at ffffffffa041ed58 [libceph] > #8 [ffff8803235fddf8] con_work at ffffffffa041fb2e [libceph] > #9 [ffff8803235fde28] process_one_work at ffffffff8107487c > #10 [ffff8803235fde78] worker_thread at ffffffff8107740a > #11 [ffff8803235fdee8] kthread at ffffffff8107b736 > #12 [ffff8803235fdf48] kernel_thread_helper at ffffffff8144c144 > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html