Re: [PATCH] block: loop: set discard granularity and alignment for block device backed loop

Coly Li <colyli@xxxxxxx> · Tue, 4 Aug 2020 22:30:05 +0800

On 2020/8/4 16:14, Ming Lei wrote:
> On Tue, Aug 04, 2020 at 03:50:32PM +0800, Coly Li wrote:
>> On 2020/8/4 14:01, Ming Lei wrote:
>>> On Tue, Aug 04, 2020 at 12:29:07PM +0800, Coly Li wrote:
>>>> On 2020/8/4 12:15, Ming Lei wrote:
>>>>> In case of block device backend, if the backend supports discard, the
>>>>> loop device will set queue flag of QUEUE_FLAG_DISCARD.
>>>>>
>>>>> However, limits.discard_granularity isn't setup, and this way is wrong,
>>>>> see the following description in Documentation/ABI/testing/sysfs-block:
>>>>>
>>>>> 	A discard_granularity of 0 means that the device does not support
>>>>> 	discard functionality.
>>>>>
>>>>> Especially 9b15d109a6b2 ("block: improve discard bio alignment in
>>>>> __blkdev_issue_discard()") starts to take q->limits.discard_granularity
>>>>> for computing max discard sectors. And zero discard granularity causes
>>>>> kernel oops[1].
>>>>>
>>>>> Fix the issue by set up discard granularity and alignment.
>>>>>

[snipped]

>>>>
>>>> Hi Ming,
>>>>
>>>> I did similar change, it can avoid the panic or 0 length discard bio.
>>>> But yesterday I realize the discard request to loop device should not go
>>>> into __blkdev_issue_discard(). As commit c52abf563049 ("loop: Better
>>>> discard support for block devices") mentioned it should go into
>>>> blkdev_issue_zeroout(), this is why in loop_config_discard() the
>>>> max_discard_sectors is set to backingq->limits.max_write_zeroes_sectors.
>>>
>>> That commit meant REQ_OP_DISCARD on a loop device is translated into
>>> blkdev_issue_zeroout(), because REQ_OP_DISCARD is handled as
>>> file->f_op->fallocate(FALLOC_FL_PUNCH_HOLE), which will cause
>>> "subsequent reads from this range will return zeroes".
>>>
>>>>
>>>> Now I am looking at the problem why discard request on loop device
>>>> doesn't go into blkdev_issue_zeroout().
>>>
>>> No, that is correct behavior, since loop can support discard or zeroout.
>>>
>>> If QUEUE_FLAG_DISCARD is set, either discard_granularity or max discard
>>> sectors shouldn't be zero.
>>>
>>> This patch shouldn't set discard_granularity if limits.max_write_zeroes_sectors
>>> is zero, will fix it in V2.
>>>
>>>>
>>>> With the above change, the discard is very slow on loop device with
>>>> backing device. In my testing, mkfs.xfs on /dev/loop0 does not complete
>>>> in 20 minutes, each discard request is only 4097 sectors.
>>>
>>> I'd suggest you to check the discard related queue limits, and see why
>>> each discard request just sends 4097 sectors.
>>>
>>> Or we need to mirror underlying queue's discard_granularity too? Can you
>>> try the following patch?
>>
>> Can I know the exact command line to reproduce the panic ?
> 
> modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
> 
> losetup -f /dev/sdc --direct-io=on   #suppose /dev/sdc is the scsi_debug LUN
> 
> mkfs.ext4 /dev/loop0				#suppose loop0 is the loop backed by /dev/sdc
> 
> mount /dev/loop0 /mnt
> 
> cd /mnt
> dbench -t 20 -s 64
> 

Thanks for the information. With the above command lines, I also find a
special case that I can see a 0 byte request triggers a similar BUG()
panic --- when the discard LBA is 0, and loop device driver
queue->limits.discard_granularity is 0.

> 
>>
>> I try to use blkdiscard, or dbench -D /mount_point_to_loop0, and see all
>> the discard requests go into blkdev_fallocate()=>blkdev_issue_zeroout().
>> No request goes into __blkdev_issue_discard().
> 
> The issue isn't related with backend queue, and it is triggered on
> loop's request.

Yes it is clear to me now. Beside the loop device driver,
__blkdev_issue_discard() should also be fixed to tolerate the buggy
queue->limits.discard_granularity, in case some other driver does
similar mistake, or a unlucky downstream kernel maintainer missing your
loop device driver fix.

Yes, please post the v2 patch, now I can help to review this patch.

Thanks for your hint :-)

Coly Li