"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> writes: > So during sustained testing overnight with the above changes, I still > managed to trigger another OOPs. However this time it appears to be > pointing at virtio-blk.c code.. > > [ 7728.484801] BUG: unable to handle kernel paging request at 0000000200000045 > [ 7728.485713] IP: [<ffffffffa006717b>] blk_done+0x51/0xf1 [virtio_blk] > [ 7728.485713] PGD 0 > [ 7728.485713] Oops: 0000 [#1] SMP > [ 7728.485713] Modules linked in: ib_srpt ib_cm ib_sa ib_mad ib_core tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 e1000 virtio_blk sd_mod sr_mod cdrom virtio_pci virtio_ring virtio ata_piix libata > [ 7728.485713] CPU 1 > [ 7728.485713] Pid: 0, comm: swapper/1 Not tainted 3.6.0-rc6+ #4 Bochs Bochs > [ 7728.485713] RIP: 0010:[<ffffffffa006717b>] [<ffffffffa006717b>] blk_done+0x51/0xf1 [virtio_blk] > [ 7728.485713] RSP: 0018:ffff88007fc83e98 EFLAGS: 00010093 > [ 7728.485713] RAX: 0000000200000001 RBX: ffff88007ad00000 RCX: 0000000000000080 > [ 7728.485713] RDX: 000000000000538a RSI: 0000000000000000 RDI: ffffea0001eddc50 > [ 7728.485713] RBP: 0000000000000092 R08: ffffea0001eddc58 R09: ffff88007d002700 > [ 7728.485713] R10: ffffffffa00410f9 R11: fffffffffffffff0 R12: ffff88007fc83ea4 > [ 7728.485713] R13: ffff88007b771478 R14: ffff88007bf197b0 R15: 0000000000000000 > [ 7728.485713] FS: 0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 > [ 7728.485713] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 7728.485713] CR2: 0000000200000045 CR3: 00000000014e7000 CR4: 00000000000006e0 > [ 7728.485713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7728.485713] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 7728.485713] Process swapper/1 (pid: 0, threadinfo ffff88007d392000, task ffff88007d363380) > [ 7728.485713] Stack: > [ 7728.485713] 0000002924276c47 00200001810229fc 0000000100715b34 ffff88007ae12800 > [ 7728.485713] ffff88007bf19700 0000000000000000 0000000000000029 ffffffffa0041286 > [ 7728.485713] ffff880037bb8800 ffffffff8108448c 000010e324804758 0000000000000000 > [ 7728.485713] Call Trace: > [ 7728.485713] <IRQ> > [ 7728.485713] [<ffffffffa0041286>] ? vring_interrupt+0x6f/0x76 [virtio_ring] > [ 7728.485713] [<ffffffff8108448c>] ? handle_irq_event_percpu+0x2d/0x130 > [ 7728.485713] [<ffffffff810845bd>] ? handle_irq_event+0x2e/0x4c > [ 7728.485713] [<ffffffff8108694f>] ? handle_edge_irq+0x98/0xb9 > [ 7728.485713] [<ffffffff81003aa7>] ? handle_irq+0x17/0x20 > [ 7728.485713] [<ffffffff810032da>] ? do_IRQ+0x45/0xad > [ 7728.485713] [<ffffffff8137f72a>] ? common_interrupt+0x6a/0x6a > [ 7728.485713] <EOI> > [ 7728.485713] [<ffffffff810229fc>] ? native_safe_halt+0x2/0x3 > [ 7728.485713] [<ffffffff8100887e>] ? default_idle+0x23/0x3f > [ 7728.485713] [<ffffffff81008b02>] ? cpu_idle+0x6b/0xaa > [ 7728.485713] [<ffffffff81379f33>] ? start_secondary+0x1f5/0x1fa > [ 7728.485713] Code: 8b b8 b0 03 00 00 e8 55 84 31 e1 48 89 c5 eb 6e 41 8a 45 28 be fb ff ff ff 3c 02 77 0a 0f b6 c0 8b 34 85 70 81 06 a0 49 8b 45 00 <8b> 50 44 83 fa 02 74 07 83 fa 07 75 31 eb 22 41 8b 55 24 89 90 > [ 7728.485713] RIP [<ffffffffa006717b>] blk_done+0x51/0xf1 [virtio_blk] > [ 7728.485713] RSP <ffff88007fc83e98> > [ 7728.485713] CR2: 0000000200000045 > [ 7728.485713] ---[ end trace de9d8ade00a76876 ]--- > [ 7728.485713] Kernel panic - not syncing: Fatal exception in interrupt > > So looking at the RIP with gdb, it points into the following code that pulls > a struct virtblk_req *vbr off the virtio_ring with virtqueue_get_buf(): > > (gdb) list *(blk_done+0x51) > 0x19f is in blk_done (drivers/block/virtio_blk.c:82). > 77 default: > 78 error = -EIO; > 79 break; > 80 } > 81 > 82 switch (vbr->req->cmd_type) { > 83 case REQ_TYPE_BLOCK_PC: > 84 vbr->req->resid_len = vbr->in_hdr.residual; > 85 vbr->req->sense_len = vbr->in_hdr.sense_len; > 86 vbr->req->errors = vbr->in_hdr.errors; > (gdb) > > So it's starting to look pretty clear that the virtio_ring used by > virtio-blk is somehow getting messed up.. Now enabling DEBUG within > virtio_ring.ko code to try and get some more details. > > virtio_ring folks (Rusty + MST CC'ed), is there any other debug code > that would be helpful to track this down..? It looks like either vbr is complete crap, or already freed. Let's make sure. Assuming this is true: 1) We have a race in the virtio_blk driver, which is corrupting the ring (eg. simultanous virtqueue_get_buf calls). Locking looks pretty trivial here though. DEBUG might help with this. 2) Qemu has a bug and is screwing up the ring, giving us a request twice. 3) The virtio_ring core has a bug. This is least likely, though of course not impossible. Here's a patch to try which should tell us what species of corruption it is: diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 303779c..3e3081f 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -55,6 +55,7 @@ struct virtio_blk struct virtblk_req { + u32 magic; struct list_head list; struct request *req; struct virtio_blk_outhdr out_hdr; @@ -73,6 +74,11 @@ static void blk_done(struct virtqueue *vq) while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { int error; + if (unlikely(vbr->magic != 0x87654321)) { + printk("vbr bad magic: 0x%08x\n", vbr->magic); + continue; /* And pray... */ + } + switch (vbr->status) { case VIRTIO_BLK_S_OK: error = 0; @@ -100,6 +106,7 @@ static void blk_done(struct virtqueue *vq) __blk_end_request_all(vbr->req, error); list_del(&vbr->list); + vbr->magic = 0xfee1dead; mempool_free(vbr, vblk->pool); } /* In case queue is stopped waiting for more buffers. */ @@ -117,6 +124,7 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, if (!vbr) /* When another request finishes we'll try again. */ return false; + vbr->magic = 0x11111111; vbr->req = req; @@ -179,7 +187,9 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, } } + vbr->magic = 0x87654321; if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) { + vbr->magic = 0xc0ffee; mempool_free(vbr, vblk->pool); return false; } -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html