Re: [PATCH] md/raid5: Improve performance for sequential IO

Logan Gunthorpe <logang@xxxxxxxxxxxx> · Thu, 20 Apr 2023 14:10:02 -0600

On 2023-04-20 14:07, Song Liu wrote:
> On Thu, Apr 20, 2023 at 8:54 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>>
>>
>>
>> On 2023-04-20 05:26, Jan Kara wrote:
>>> On Wed 19-04-23 22:26:07, Song Liu wrote:
>>>> On Wed, Apr 19, 2023 at 12:04 PM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>>>>> On 2023-04-17 11:15, Jan Kara wrote:
>>>>>> Commit 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") changed the
>>>>>> order in which requests for underlying disks are created. Since for
>>>>>> large sequential IO adding of requests frequently races with md_raid5
>>>>>> thread submitting bios to underlying disks, this results in a change in
>>>>>> IO pattern because intermediate states of new order of request creation
>>>>>> result in more smaller discontiguous requests. For RAID5 on top of three
>>>>>> rotational disks our performance testing revealed this results in
>>>>>> regression in write throughput:
>>>>>>
>>>>>> iozone -a -s 131072000 -y 4 -q 8 -i 0 -i 1 -R
>>>>>>
>>>>>> before 7e55c60acfbb:
>>>>>>               KB  reclen   write rewrite    read    reread
>>>>>>        131072000       4  493670  525964   524575   513384
>>>>>>        131072000       8  540467  532880   512028   513703
>>>>>>
>>>>>> after 7e55c60acfbb:
>>>>>>               KB  reclen   write rewrite    read    reread
>>>>>>        131072000       4  421785  456184   531278   509248
>>>>>>        131072000       8  459283  456354   528449   543834
>>>>>>
>>>>>> To reduce the amount of discontiguous requests we can start generating
>>>>>> requests with the stripe with the lowest chunk offset as that has the
>>>>>> best chance of being adjacent to IO queued previously. This improves the
>>>>>> performance to:
>>>>>>               KB  reclen   write rewrite    read    reread
>>>>>>        131072000       4  497682  506317   518043   514559
>>>>>>        131072000       8  514048  501886   506453   504319
>>>>>>
>>>>>> restoring big part of the regression.
>>>>>>
>>>>>> Fixes: 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()")
>>>>>> Signed-off-by: Jan Kara <jack@xxxxxxx>
>>>>>
>>>>> Looks good to me. I ran it through some of the functional tests I used
>>>>> to develop the patch in question and found no issues.
>>>>>
>>>>> Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
>>>>
>>>> Thanks Jan and Logan! I will apply this to md-next. But it may not make
>>>> 6.4 release, as we are already at rc7.
>>>
>>> OK, sure, this is not a critical issue.
>>>
>>>>>> ---
>>>>>>  drivers/md/raid5.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>  1 file changed, 44 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> I'm by no means raid5 expert but this is what I was able to come up with. Any
>>>>>> opinion or ideas how to fix the problem in a better way?
>>>>>
>>>>> The other option would be to revert to the old method for spinning disks
>>>>> and use the pivot option only on SSDs. The pivot optimization really
>>>>> only applies at IO speeds that can be achieved by fast solid state disks
>>>>> anyway, and there isn't likely enough IOPS possible on spinning disks to
>>>>> get enough lock contention that the optimization would provide any benefit.
>>>>>
>>>>> So it could make sense to just fall back to the old method of preparing
>>>>> the stripes in logical block order if we are running on spinning disks.
>>>>> Though, that might be a bit more involved than what this patch does. So
>>>>> I think this is probably a good approach, unless we want to recover more
>>>>> of the lost throughput.
>>>>
>>>> How about we only do the optimization in this patch for spinning disks?
>>>> Something like:
>>>>
>>>>         if (likely(conf->reshape_progress == MaxSector) &&
>>>>             !blk_queue_nonrot(mddev->queue))
>>>>                 logical_sector = raid5_bio_lowest_chunk_sector(conf, bi);
>>>
>>> Yeah, makes sense. On SSD disks I was not able to observe any adverse
>>> effects of the different stripe order. Will you update the patch or should
>>> I respin it?
>>
>> Does it make sense? If SSDs work fine with the new stripe order, why do
>> things different for them? So I don't see a  benefit of making the fix
>> only work with non-rotational devices. It's my original change which
>> could be made for non-rotationatial only, but that's much more involved.
> 
> I am hoping to make raid5_make_request() a little faster for non-rotational
> devices. We may not easily observe a difference in performance, but things
> add up. Does this make sense?

I guess. But without a performance test that shows that it makes an
improvement, I'm hesitant about that. It could also be that it helps a
tiny bit for non-rotational disks, but we just don't know.

Unfortunately, I don't have the time right now to do these performance
tests.

Logan