Re: [v2] block: Fix a WRITE SAME BUG_ON

"zhangxiaoxu (A)" <zhangxiaoxu5@xxxxxxxxxx> · Mon, 28 Jan 2019 13:48:15 +0800

Hi， Thanks for your reply.

BUG_ON is because the 'bio_iovec(bio).bv_len' not same with 'sdp->sector_size'.

Just as below reproducer, If the 'golden' is stacked by disks 'sda'(logical_block_size=512)
and 'sdb'(logical_block_size=4096), call 'blkdev_issue_write_same' on it will BUG_ON because
of the
	bio->bi_io_vec->bv_len = bdev_logical_block_size('golden'), and the
	bdev_logical_block_size('golden') = max_logical_block_size(sda, sdb).

There are 2 solutions about the problem:
1. Disable the WRITE SAME for that situation: Just like this patch.
2. Improve the WRITE SAME to adapt that situation:
	Reorganization the 'WRITE SAME' bio for the disks which logical_block_size is smaller than the logical volume.


For now, I think we should close the 'WRITE SAME' for this situation.


Reproducer:
1. Start qemu with parameter:
    -drive file=/home/qemu/disk/512k.img,if=none,format=raw,cache=none,id=data.1,discard=on \
    -device scsi-hd,drive=data.1,id=vdata.1 \
    -drive file=/home/qemu/disk/4096k.img,if=none,format=raw,cache=none,id=data.2,discard=on \
    -device scsi-hd,drive=data.2,id=vdata.2,logical_block_size=4096,physical_block_size=4096 \

2. Create LV:
    vgremove golden -y
    vgcreate golden /dev/sda /dev/sdb
    lvcreate -L1G -n testvol -i 2 golden -y

3. Call 'blkdev_issue_write_same' on the device:
	#include <linux/module.h>
	#include <linux/kernel.h>
	#include <linux/init.h>
	#include <linux/fs.h>
	#include <linux/slab.h>
	#include <linux/blkdev.h>

	static int __init hello_module_init(void)
	{
		struct block_device *dev;
		int err;

		printk(KERN_ERR "/dev/golden/testvol\n");

		dev = lookup_bdev("/dev/golden/testvol");
		if (IS_ERR(dev)) {
			pr_warn("lookup blk error!\n");
			return 0;
		}

		err = blkdev_get(dev, FMODE_WRITE | FMODE_READ | FMODE_EXCL, (void *)&err);

		if (err) {
			pr_warn("get blk error!\n");
			return 0;
		}

		err = blkdev_issue_write_same(dev, 8, 0xf8, GFP_NOIO, ZERO_PAGE(0));

		printk(KERN_ERR "blkdev_issue_write_same return %d\n", err);

		return 0;
	}

	static void __exit hello_module_exit(void)
	{
		pr_warn("\n");
	}

	module_init(hello_module_init);
	module_exit(hello_module_exit);

	MODULE_DESCRIPTION("hello module");
	MODULE_LICENSE("GPL");

On 1/26/2019 7:17 PM, John Dorminy wrote:
Hi. I have read a bit of DM code and spent an hour reviewing this... I
didn't get to the point of knowing what the right fix for the problem
is, and I may have a wrong understanding, but I have two thoughts
about the patch:

I don't think this is the right solution for two reasons:

In the first place, if it's an LVM-only issue, we should fix it only
for device-mapper devices. If this is the right way to fix it,
possibly the way to do that would be to change DM calls to
blk_queue_max_write_same_sectors() to only set the max sectors to more
than 0 if and only if the logical block sizes match.

In the second place, I don't think this is the problem. The line of
code that's calling BUG_ON is, I think,
BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);

This is because write_same bios are supposed to have a single sector
of data at the beginning of a page in their bio_iovec(bio) [I think,
based on a commit message I've read] -- in other words, bio_iovec(bio)
is supposed to say something like { .page = something, .offset = 0,
.len = 1 native sector }. But clearly, since the BUG_ON is being
called, one of these is not true. Have you added a log statement right
before the BUG_ON() to print out bio_offset(bio) and
bio_iovec(bio).bv_len to see which value defies expectations?

I would be happy to help you trace through this or attempt to
reproduce it myself -- what stack of devices can you reproduce this
on? Is this a dm-linear device on top of a disk? Does it have a
filesystem on top, and if so, what filesystem?

Thank you!

John Dorminy


On Fri, Jan 25, 2019 at 9:53 AM Zhang Xiaoxu <zhangxiaoxu5@xxxxxxxxxx> wrote:

If the lvm is stacked by different logical_block_size disks,
when WRITE SAME on it, will bug_on:

kernel BUG at drivers/scsi/sd.c:968!
invalid opcode: 0000 [#1] SMP PTI
CPU: 11 PID: 525 Comm: kworker/11:1H Tainted: G           O      5.0.0-rc3+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:sd_init_command+0x7aa/0xdb0
Code: 30 75 00 00 c7 85 44 01 00 00 18 00 00 00 0f 85 fa 04 00 00 48 83 c4 40 48
       89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 b6 ca fe ff <0f> 0b 41 bc 09
RSP: 0018:ffffb55f80ddbca0 EFLAGS: 00010206
RAX: 0000000000001000 RBX: ffff9ed23fb927a0 RCX: 000000000000f000
RDX: ffff9ed23f0a8400 RSI: ffff9ed27bc79800 RDI: 0000000000000000
RBP: ffff9ed23fb92680 R08: ffff9ed27c8c0000 R09: ffff9ed23fb927d8
R10: 0000000000000000 R11: fefefefefefefeff R12: ffff9ed27bc79800
R13: ffff9ed2787a0000 R14: ffff9ed27bdf3400 R15: ffff9ed23fb927a0
FS:  0000000000000000(0000) GS:ffff9ed27c8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b14cf9341 CR3: 0000000069058000 CR4: 00000000000006e0
Call Trace:
  ? vp_notify+0x12/0x20
  scsi_queue_rq+0x525/0xa30
  blk_mq_dispatch_rq_list+0x8d/0x580
  ? syscall_return_via_sysret+0x10/0x7f
  ? elv_rb_del+0x1f/0x30
  ? deadline_remove_request+0x55/0xc0
  blk_mq_do_dispatch_sched+0x76/0x110
  blk_mq_sched_dispatch_requests+0xf9/0x170
  __blk_mq_run_hw_queue+0x51/0xd0
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
Modules linked in: alloc(O+)
---[ end trace dc92ddeb2e6d1fe5 ]---

The logical_block_size of the LVM is the max value of the sub disks,
it maybe different with one of the sub disk. when WRITE SAME on the
disk, it will BUG_ON when setup WRITE SAME cmd.

Close WRITE_SAME feature on the LVM if it was stacked by different
logical_block_size disk.

Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@xxxxxxxxxx>
---
  block/blk-settings.c | 10 ++++++++--
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 3e7038e475ee..e4664280edb4 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -497,8 +497,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
         t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
         t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
         t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors);
-       t->max_write_same_sectors = min(t->max_write_same_sectors,
-                                       b->max_write_same_sectors);
         t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
                                         b->max_write_zeroes_sectors);
         t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
@@ -537,6 +535,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
                 }
         }

+       /* If the logical block size is different, forbid write same */
+       if (t->logical_block_size != b->logical_block_size &&
+           t->max_write_same_sectors != UINT_MAX)
+               t->max_write_same_sectors = 0;
+       else
+               t->max_write_same_sectors = min(t->max_write_same_sectors,
+                                               b->max_write_same_sectors);
+
         t->logical_block_size = max(t->logical_block_size,
                                     b->logical_block_size);

--
2.14.4

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

.


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel