Re: [Bug Report] Discard bios cannot be correctly merged in blk-mq

Xiao Ni <xni@xxxxxxxxxx> · Tue, 22 Jun 2021 09:48:42 +0800

Hi

For normal read/write requests, it needs to consider parity
calculation. Now it uses page size as the unit. The whole design
is based on this. So it's very hard to change this. But there are many
efforts to improve the performance. The batch requests
can improve the performance.
https://www.spinics.net/lists/raid/msg47207.html. It can help to avoid
to send many small
bios to disks.

And for the discard part, I'll try to do this job.

Regards
Xiao

On Mon, Jun 21, 2021 at 3:49 PM Wang Shanker <shankerwangmiao@xxxxxxxxx> wrote:
>
> Hi, Xiao
>
> Many thanks for your reply. I realized that this problem is not limited
> to discard requests. For normal read/write requests, they are also first
> get split into 4k-sized ones and then merged into larger ones. The merging
> of bio's is limited by queue_max_segments, which leads to small trunks of
> io operations issued to physical devices. It seems that such behavior is
> not optimal and should be improved.
>
> I'm not so familiar with raid456. Could you have a look of its code when you
> are free? It seems that improving this may result in big changes.
>
> Cheers,
>
> Miao Wang
>
> > 2021年06月18日 20:49，Xiao Ni <xni@xxxxxxxxxx> 写道：
> >
> > Hi Miao
> >
> > So you plan to fix this problem now? The plan is to submit the discard
> > bio directly to disk similar with raid0/raid10.
> > As we talked, it needs to consider the discard region. It should be
> > larger than chunk_sectors * nr_data_disks. It needs
> > to split the bio when its size not aligned with chunk_sectors *
> > nr_data_disks. And it needs to consider the start address
> > of the bio too. If it's not aligned with a start address of
> > chunk_sectors, it's better to split this part too.
> >
> > I'm working on another job. So I don't have time to do this now. If
> > you submit the patches, I can help to review :)
> >
> > Regards
> > Xiao
> >
> > On Fri, Jun 18, 2021 at 2:28 PM Wang Shanker <shankerwangmiao@xxxxxxxxx> wrote:
> >>
> >> Hi, Xiao
> >>
> >> Any ideas on this issue?
> >>
> >> Cheers,
> >>
> >> Miao Wang
> >>
> >>> 2021年06月09日 17:03，Wang Shanker <shankerwangmiao@xxxxxxxxx> 写道：
> >>>
> >>>>
> >>>> 2021年06月09日 16:44，Xiao Ni <xni@xxxxxxxxxx> 写道：
> >>>>
> >>>> Hi all
> >>>>
> >>>> Thanks for reporting about this. I did a test in my environment.
> >>>> time blkdiscard /dev/nvme5n1  (477GB)
> >>>> real    0m0.398s
> >>>> time blkdiscard /dev/md0
> >>>> real    9m16.569s
> >>>>
> >>>> I'm not familiar with the block layer codes. I'll try to understand
> >>>> the codes related with discard request and
> >>>> try to fix this problem.
> >>>>
> >>>> I have a question for raid5 discard, it needs to consider more than
> >>>> raid0 and raid10. For example, there is a raid5 with 3 disks.
> >>>> D11 D21 P1 (stripe size is 4KB)
> >>>> D12 D22 P2
> >>>> D13 D23 P3
> >>>> D14 D24 P4
> >>>> ...  (chunk size is 512KB)
> >>>> If there is a discard request on D13 and D14, and there is no discard
> >>>> request on D23 D24. It can't send
> >>>> discard request to D13 and D14, right? P3 = D23 xor D13. If we discard
> >>>> D13 and disk2 is broken, it can't
> >>>> get the right data from D13 and P3. The discard request on D13 can
> >>>> write 0 to the discard region, right?
> >>>
> >>> Yes. It can be seen at the beginning of make_discard_request(), where
> >>> the requested range being discarded is aligned to ``stripe_sectors",
> >>> which should be chunk_sectors * nr_data_disks.
> >>>
> >>> Cheers,
> >>>
> >>> Miao Wang
> >>
> >
>