On 2020/8/4 12:15, Ming Lei wrote: > In case of block device backend, if the backend supports discard, the > loop device will set queue flag of QUEUE_FLAG_DISCARD. > > However, limits.discard_granularity isn't setup, and this way is wrong, > see the following description in Documentation/ABI/testing/sysfs-block: > > A discard_granularity of 0 means that the device does not support > discard functionality. > > Especially 9b15d109a6b2 ("block: improve discard bio alignment in > __blkdev_issue_discard()") starts to take q->limits.discard_granularity > for computing max discard sectors. And zero discard granularity causes > kernel oops[1]. > > Fix the issue by set up discard granularity and alignment. > > [1] kernel oops when use block disk as loop backend: > > [ 33.405947] ------------[ cut here ]------------ > [ 33.405948] kernel BUG at block/blk-mq.c:563! > [ 33.407504] invalid opcode: 0000 [#1] SMP PTI > [ 33.408245] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.8.0-rc2_update_rq+ #17 > [ 33.409466] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx2.fedor4 > [ 33.411546] RIP: 0010:blk_mq_end_request+0x1c/0x2a > [ 33.412354] Code: 48 89 df 5b 5d 41 5c 41 5d e9 2d fe ff ff 0f 1f 44 00 00 55 48 89 fd 53 40 0f b6 de 8b 57 5 > [ 33.415724] RSP: 0018:ffffc900019ccf48 EFLAGS: 00010202 > [ 33.416688] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88826f2600c0 > [ 33.417990] RDX: 0000000000200042 RSI: 0000000000000001 RDI: ffff888270c13100 > [ 33.419286] RBP: ffff888270c13100 R08: 0000000000000001 R09: ffffffff81345144 > [ 33.420584] R10: ffff88826f260600 R11: 0000000000000000 R12: ffff888270c13100 > [ 33.421881] R13: ffffffff820050c0 R14: 0000000000000004 R15: 0000000000000004 > [ 33.423053] FS: 0000000000000000(0000) GS:ffff888277d80000(0000) knlGS:0000000000000000 > [ 33.424360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 33.425416] CR2: 00005581d7903f08 CR3: 00000001e9a16002 CR4: 0000000000760ee0 > [ 33.426580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 33.427706] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 33.428910] PKRU: 55555554 > [ 33.429412] Call Trace: > [ 33.429858] <IRQ> > [ 33.430189] blk_done_softirq+0x84/0xa4 > [ 33.430884] __do_softirq+0x11a/0x278 > [ 33.431567] asm_call_on_stack+0x12/0x20 > [ 33.432283] </IRQ> > [ 33.432673] do_softirq_own_stack+0x31/0x40 > [ 33.433448] __irq_exit_rcu+0x46/0xae > [ 33.434132] sysvec_call_function_single+0x7d/0x8b > [ 33.435021] asm_sysvec_call_function_single+0x12/0x20 > [ 33.435965] RIP: 0010:native_safe_halt+0x7/0x8 > [ 33.436795] Code: 25 c0 7b 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b 00 a8 08 74 0b 65 81 25 db 38 95 7e b > [ 33.440179] RSP: 0018:ffffc9000191fee0 EFLAGS: 00000246 > [ 33.441143] RAX: ffffffff816c4118 RBX: 0000000000000000 RCX: ffff888276631a00 > [ 33.442442] RDX: 000000000000ddfe RSI: 0000000000000003 RDI: 0000000000000001 > [ 33.443742] RBP: 0000000000000000 R08: 00000000f461e6ac R09: 0000000000000004 > [ 33.445046] R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000 > [ 33.446352] R13: 0000000000000003 R14: ffffffffffffffff R15: 0000000000000000 > [ 33.447450] ? ldsem_down_write+0x1f5/0x1f5 > [ 33.448158] default_idle+0x1b/0x2c > [ 33.448809] do_idle+0xf9/0x248 > [ 33.449392] cpu_startup_entry+0x1d/0x1f > [ 33.450118] start_secondary+0x168/0x185 > [ 33.450843] secondary_startup_64+0xa4/0xb0 > [ 33.451612] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support nvme nvme_core i2c_i801 lpc_ich virtis > [ 33.454292] Dumping ftrace buffer: > [ 33.454951] (ftrace buffer empty) > [ 33.455626] ---[ end trace beece202d663d38f ]--- > > Fixes: 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()") > Cc: Coly Li <colyli@xxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxxx> > Cc: Xiao Ni <xni@xxxxxxxxxx> > Cc: Martin K. Petersen <martin.petersen@xxxxxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > drivers/block/loop.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d18160146226..6102370bee35 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -890,6 +890,11 @@ static void loop_config_discard(struct loop_device *lo) > struct request_queue *backingq; > > backingq = bdev_get_queue(inode->i_bdev); > + > + q->limits.discard_granularity = > + queue_physical_block_size(backingq); > + q->limits.discard_alignment = 0; > + > blk_queue_max_discard_sectors(q, > backingq->limits.max_write_zeroes_sectors); > > Hi Ming, I did similar change, it can avoid the panic or 0 length discard bio. But yesterday I realize the discard request to loop device should not go into __blkdev_issue_discard(). As commit c52abf563049 ("loop: Better discard support for block devices") mentioned it should go into blkdev_issue_zeroout(), this is why in loop_config_discard() the max_discard_sectors is set to backingq->limits.max_write_zeroes_sectors. Now I am looking at the problem why discard request on loop device doesn't go into blkdev_issue_zeroout(). With the above change, the discard is very slow on loop device with backing device. In my testing, mkfs.xfs on /dev/loop0 does not complete in 20 minutes, each discard request is only 4097 sectors. Coly Li