On Tue, Nov 11, 2014 at 11:42 PM, Dongsu Park <dongsu.park@xxxxxxxxxxxxxxxx> wrote: > Hi Ming, > > On 11.11.2014 08:56, Ming Lei wrote: >> On Tue, Nov 11, 2014 at 7:31 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: >> > Known, I'm afraid, Ming is looking into it. > > Actually I had also tried to reproduce this bug, without success. > But today I happened to know how to trigger the bug, by coincidence, > during testing other things. > > Try to run xfstests/generic/034. You'll see the crash immediately. > Tested on a QEMU VM with kernel 3.18-rc4, virtio-blk, dm-flakey and xfs. > >> There is one obvious bug which should have been fixed by below >> patch(0001-block-blk-merge-fix-blk_recount_segments.patch"): >> >> http://marc.info/?l=linux-virtualization&m=141562191719405&q=p3 > > This patch didn't bring anything to me, as Lukas said. > >> And there might be another one, I appreciate someone can post >> log which is printed by patch("blk-seg.patch") in below link if the >> bug still can be triggered even with above fix: >> >> http://marc.info/?l=linux-virtualization&m=141473040618467&q=p3 > > "blk_recount_segments: 1-1-1 vcnt-128 segs-128" > > As long as I understood so far, the reason is that bi_phys_segments gets > sometimes bigger than queue_max_sectors() after blk_recount_segments(). > That happens no matter whether segments are recalculated or not. I know the problem now, since bi_vcnt can't be used for cloned bio, and the patch I sent last time is wrong too. > I'm not completely sure about what to do, but how about the attached patch? > It seems to work, according to several xfstests runs. > > Cheers, > Dongsu > > ---- > > From 1db98323931eb9ab430116c4d909d22222c16e22 Mon Sep 17 00:00:00 2001 > From: Dongsu Park <dongsu.park@xxxxxxxxxxxxxxxx> > Date: Tue, 11 Nov 2014 13:10:59 +0100 > Subject: [RFC PATCH] blk-merge: make bi_phys_segments consider also > queue_max_segments() > > When recounting the number of physical segments, the number of max > segments of request_queue must be also taken into account. > Otherwise bio->bi_phys_segments could get bigger than > queue_max_segments(). Then this results in virtio_queue_rq() seeing > req->nr_phys_segments that is greater than expected. Although the > initial queue_max_segments was set to (vblk->sg_elems - 2), a request > comes in with a larger value of nr_phys_segments, which triggers the > BUG_ON() condition. > > This commit should fix a kernel crash in virtio_blk, which occurs > especially frequently when it runs with blk-mq, device mapper, and xfs. > The simplest way to reproduce this bug is to run xfstests/generic/034. > Note, test 034 requires dm-flakey to be turned on in the kernel config. > > See the kernel trace below: > ------------[ cut here ]------------ > kernel BUG at drivers/block/virtio_blk.c:172! > invalid opcode: 0000 [#1] SMP > CPU: 1 PID: 3343 Comm: mount Not tainted 3.18.0-rc4+ #55 > RIP: 0010:[<ffffffff81561027>] > [<ffffffff81561027>] virtio_queue_rq+0x277/0x280 > Call Trace: > [<ffffffff8142e908>] __blk_mq_run_hw_queue+0x1a8/0x300 > [<ffffffff8142f00d>] blk_mq_run_hw_queue+0x6d/0x90 > [<ffffffff8143003e>] blk_sq_make_request+0x23e/0x360 > [<ffffffff81422e20>] generic_make_request+0xc0/0x110 > [<ffffffff81422ed9>] submit_bio+0x69/0x130 > [<ffffffff812f013d>] _xfs_buf_ioapply+0x2bd/0x410 > [<ffffffff81315f38>] ? xlog_bread_noalign+0xa8/0xe0 > [<ffffffff812f1bd1>] xfs_buf_submit_wait+0x61/0x1d0 > [<ffffffff81315f38>] xlog_bread_noalign+0xa8/0xe0 > [<ffffffff81316917>] xlog_bread+0x27/0x60 > [<ffffffff8131ad11>] xlog_find_verify_cycle+0xe1/0x190 > [<ffffffff8131b291>] xlog_find_head+0x2d1/0x3c0 > [<ffffffff8131b3ad>] xlog_find_tail+0x2d/0x3f0 > [<ffffffff8131b78e>] xlog_recover+0x1e/0xf0 > [<ffffffff8130fbac>] xfs_log_mount+0x24c/0x2c0 > [<ffffffff813075db>] xfs_mountfs+0x44b/0x7a0 > [<ffffffff8130a98a>] xfs_fs_fill_super+0x2ba/0x330 > [<ffffffff811cea64>] mount_bdev+0x194/0x1d0 > [<ffffffff8130a6d0>] ? xfs_parseargs+0xbe0/0xbe0 > [<ffffffff813089a5>] xfs_fs_mount+0x15/0x20 > [<ffffffff811cf389>] mount_fs+0x39/0x1b0 > [<ffffffff8117bf75>] ? __alloc_percpu+0x15/0x20 > [<ffffffff811e9887>] vfs_kern_mount+0x67/0x110 > [<ffffffff811ec584>] do_mount+0x204/0xad0 > [<ffffffff811ed18b>] SyS_mount+0x8b/0xe0 > [<ffffffff81788e12>] system_call_fastpath+0x12/0x17 > RIP [<ffffffff81561027>] virtio_queue_rq+0x277/0x280 > ---[ end trace ae3ec6426f011b5d ]--- > > Signed-off-by: Dongsu Park <dongsu.park@xxxxxxxxxxxxxxxx> > Tested-by: Dongsu Park <dongsu.park@xxxxxxxxxxxxxxxx> > Cc: Ming Lei <tom.leiming@xxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx> > Cc: Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> > Cc: Lukas Czerner <lczerner@xxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx > --- > block/blk-merge.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index b3ac40a..d808601 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -103,13 +103,16 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) > > if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) && > merge_not_need) > - bio->bi_phys_segments = bio->bi_vcnt; > + bio->bi_phys_segments = min_t(unsigned int, bio->bi_vcnt, > + queue_max_segments(q)); > else { > struct bio *nxt = bio->bi_next; > > bio->bi_next = NULL; > - bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, > - no_sg_merge && merge_not_need); > + bio->bi_phys_segments = min_t(unsigned int, > + __blk_recalc_rq_segments(q, bio, no_sg_merge > + && merge_not_need), > + queue_max_segments(q)); > bio->bi_next = nxt; > } The above change may cause some data not written to/read from device, and we have to merge if segments number will become bigger than the limit. The attached patch should fix the problem, and hope it is the last one, :-) Thanks, -- Ming Lei
From 9cedeb8cfd420ecfcd3e2b2e0bcb699d35ae2a03 Mon Sep 17 00:00:00 2001 From: Ming Lei <tom.leiming@xxxxxxxxx> Date: Wed, 12 Nov 2014 00:15:41 +0800 Subject: [PATCH] block: blk-merge: fix blk_recount_segments() For cloned bio, bio->bi_vcnt can't be used at all, and we have resort to bio_segments() to figure out how many segment there are in the bio. Signed-off-by: Ming Lei <tom.leiming@xxxxxxxxx> --- block/blk-merge.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index b3ac40a..89b97b5 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -97,19 +97,22 @@ void blk_recalc_rq_segments(struct request *rq) void blk_recount_segments(struct request_queue *q, struct bio *bio) { - bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, - &q->queue_flags); - bool merge_not_need = bio->bi_vcnt < queue_max_segments(q); + unsigned short seg_cnt; + + /* estimate segment number by bi_vcnt for non-cloned bio */ + if (bio_flagged(bio, BIO_CLONED)) + seg_cnt = bio_segments(bio); + else + seg_cnt = bio->bi_vcnt; - if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) && - merge_not_need) - bio->bi_phys_segments = bio->bi_vcnt; + if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) && + (seg_cnt < queue_max_segments(q))) + bio->bi_phys_segments = seg_cnt; else { struct bio *nxt = bio->bi_next; bio->bi_next = NULL; - bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, - no_sg_merge && merge_not_need); + bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false); bio->bi_next = nxt; } -- 1.7.9.5
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs