On Thu, 2012-11-15 at 23:59 -0800, Nicholas A. Bellinger wrote: > On Thu, 2012-11-15 at 15:50 -0800, Nicholas A. Bellinger wrote: > > On Thu, 2012-11-15 at 21:26 +0000, Prantis, Kelsey wrote: > > > On 11/13/12 2:22 PM, "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> wrote: > > > > > > > <SNIP> > > > > > Hi Nicholas, > > > > > > Sorry for the delay. The new debug output with your latest patch (and typo > > > adjustment) is up at ftp://ftp.whamcloud.com/uploads/lio-debug-4.txt.bz2 > > > > > > > Hi Kelsey, > > > > Thanks alot for this new dump. It appears you where able to trigger one > > of the extra BUG_ON(!cmd) checks added in iblock_bio_destructor(): > > > > <SNIP> > > > So it appears that iblock_bio_destructor()'s bio->bi_private assignment > > for the saved se_cmd pointer, and likely the leading se_cmd reference > > from the same ->bi_private in iblock_bio_done() is being cleared and/or > > stomped upon by something ahead of the callback into IBLOCK code.. > > > > Still not exactly sure what's going on here, but I'm starting to lean > > toward some type of virtio-blk w/ IBLOCK specific bug that overflow's > > something in virtio with large requests. In any event, I'll spend some > > time with with virtio-blk + IBLOCK exports next, and try to reproduce on > > v3.7-rc code. > > > > So after spending some time with virtio-blk <-> IBLOCK export in KVM > guest this evening, I'm now able to reproduce the same OOPs you've been > observing with iscsi-target LUNs, using tcm_loop LUNs w/ explicit > max_sectors_kb=8192 settings in order to generate the largish (16384 > sector * 512 byte block = 8M) requests with virtio-blk backend export. > > Thus far the bug is occurring during fio write-verify tests w/ > blocksize=8M with the tcm_loop SCSI LUN (/dev/sdX). For reference, > below is the kernel log. > > So I'll keep poking at this tomorrow, and will (hopefully) get this bug > identified ASAP. > One more update on this bug.. So after increasing max_sectors_kb for the virtio-blk w/ IBLOCK in KVM guest from default 512 to 2048 with: echo 2048 > /sys/block/vda/queue/max_sectors_kb as well as bumping the hw/virtio-blk.c max_seg default in qemu code from 126 to 258 in order to make virtio-blk guest vdX struct block_device automatically register with max_segments=256 by default: diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 6f6d172..c929b6b 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -485,7 +485,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) bdrv_get_geometry(s->bs, &capacity); memset(&blkcfg, 0, sizeof(blkcfg)); stq_raw(&blkcfg.capacity, capacity); - stl_raw(&blkcfg.seg_max, 128 - 2); + stl_raw(&blkcfg.seg_max, 258 - 2); stw_raw(&blkcfg.cylinders, s->conf->cyls); stl_raw(&blkcfg.blk_size, blk_size); stw_raw(&blkcfg.min_io_size, s->conf->min_io_size / blk_size); These two changes seem to provide a working v3.6-rc6 guest virtio-blk +IBLOCK setup that is (so far) passing fio write-verify against a local tcm_loop LUN.. After changing max_sectors_kb 512 -> 2048 for virtio-blk, the avgrq-sz ratio between virtio-block+tcm_loop block devices is now 4k vs. 16K (vs. 1K to 16K), which likely means a stack overflow somewhere in virtio-blk -> virtio code while processing a large (8 MB) struct request generated by an SCSI initiator port. Not sure just yet if the qemu virtio-blk max_segments=126 -> 256 change is necessary for the work-around, but might be worthwhile if you have a qemu build environment setup. Will try a bit more with max_segments=126 later today. Thanks again, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html