Re: [Bug Report] Discard bios cannot be correctly merged in blk-mq

Wang Shanker <shankerwangmiao@xxxxxxxxx> · Mon, 21 Jun 2021 15:49:33 +0800

Hi, Xiao

Many thanks for your reply. I realized that this problem is not limited 
to discard requests. For normal read/write requests, they are also first
get split into 4k-sized ones and then merged into larger ones. The merging
of bio's is limited by queue_max_segments, which leads to small trunks of
io operations issued to physical devices. It seems that such behavior is
not optimal and should be improved.

I'm not so familiar with raid456. Could you have a look of its code when you
are free? It seems that improving this may result in big changes.

Cheers,

Miao Wang

> 2021年06月18日 20:49，Xiao Ni <xni@xxxxxxxxxx> 写道：
> 
> Hi Miao
> 
> So you plan to fix this problem now? The plan is to submit the discard
> bio directly to disk similar with raid0/raid10.
> As we talked, it needs to consider the discard region. It should be
> larger than chunk_sectors * nr_data_disks. It needs
> to split the bio when its size not aligned with chunk_sectors *
> nr_data_disks. And it needs to consider the start address
> of the bio too. If it's not aligned with a start address of
> chunk_sectors, it's better to split this part too.
> 
> I'm working on another job. So I don't have time to do this now. If
> you submit the patches, I can help to review :)
> 
> Regards
> Xiao
> 
> On Fri, Jun 18, 2021 at 2:28 PM Wang Shanker <shankerwangmiao@xxxxxxxxx> wrote:
>> 
>> Hi, Xiao
>> 
>> Any ideas on this issue?
>> 
>> Cheers,
>> 
>> Miao Wang
>> 
>>> 2021年06月09日 17:03，Wang Shanker <shankerwangmiao@xxxxxxxxx> 写道：
>>> 
>>>> 
>>>> 2021年06月09日 16:44，Xiao Ni <xni@xxxxxxxxxx> 写道：
>>>> 
>>>> Hi all
>>>> 
>>>> Thanks for reporting about this. I did a test in my environment.
>>>> time blkdiscard /dev/nvme5n1  (477GB)
>>>> real    0m0.398s
>>>> time blkdiscard /dev/md0
>>>> real    9m16.569s
>>>> 
>>>> I'm not familiar with the block layer codes. I'll try to understand
>>>> the codes related with discard request and
>>>> try to fix this problem.
>>>> 
>>>> I have a question for raid5 discard, it needs to consider more than
>>>> raid0 and raid10. For example, there is a raid5 with 3 disks.
>>>> D11 D21 P1 (stripe size is 4KB)
>>>> D12 D22 P2
>>>> D13 D23 P3
>>>> D14 D24 P4
>>>> ...  (chunk size is 512KB)
>>>> If there is a discard request on D13 and D14, and there is no discard
>>>> request on D23 D24. It can't send
>>>> discard request to D13 and D14, right? P3 = D23 xor D13. If we discard
>>>> D13 and disk2 is broken, it can't
>>>> get the right data from D13 and P3. The discard request on D13 can
>>>> write 0 to the discard region, right?
>>> 
>>> Yes. It can be seen at the beginning of make_discard_request(), where
>>> the requested range being discarded is aligned to ``stripe_sectors",
>>> which should be chunk_sectors * nr_data_disks.
>>> 
>>> Cheers,
>>> 
>>> Miao Wang
>> 
>