Re: [PATCH] md/raid5: Improve performance for sequential IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023-04-20 05:26, Jan Kara wrote:
> On Wed 19-04-23 22:26:07, Song Liu wrote:
>> On Wed, Apr 19, 2023 at 12:04 PM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>>> On 2023-04-17 11:15, Jan Kara wrote:
>>>> Commit 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") changed the
>>>> order in which requests for underlying disks are created. Since for
>>>> large sequential IO adding of requests frequently races with md_raid5
>>>> thread submitting bios to underlying disks, this results in a change in
>>>> IO pattern because intermediate states of new order of request creation
>>>> result in more smaller discontiguous requests. For RAID5 on top of three
>>>> rotational disks our performance testing revealed this results in
>>>> regression in write throughput:
>>>>
>>>> iozone -a -s 131072000 -y 4 -q 8 -i 0 -i 1 -R
>>>>
>>>> before 7e55c60acfbb:
>>>>               KB  reclen   write rewrite    read    reread
>>>>        131072000       4  493670  525964   524575   513384
>>>>        131072000       8  540467  532880   512028   513703
>>>>
>>>> after 7e55c60acfbb:
>>>>               KB  reclen   write rewrite    read    reread
>>>>        131072000       4  421785  456184   531278   509248
>>>>        131072000       8  459283  456354   528449   543834
>>>>
>>>> To reduce the amount of discontiguous requests we can start generating
>>>> requests with the stripe with the lowest chunk offset as that has the
>>>> best chance of being adjacent to IO queued previously. This improves the
>>>> performance to:
>>>>               KB  reclen   write rewrite    read    reread
>>>>        131072000       4  497682  506317   518043   514559
>>>>        131072000       8  514048  501886   506453   504319
>>>>
>>>> restoring big part of the regression.
>>>>
>>>> Fixes: 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()")
>>>> Signed-off-by: Jan Kara <jack@xxxxxxx>
>>>
>>> Looks good to me. I ran it through some of the functional tests I used
>>> to develop the patch in question and found no issues.
>>>
>>> Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
>>
>> Thanks Jan and Logan! I will apply this to md-next. But it may not make
>> 6.4 release, as we are already at rc7.
> 
> OK, sure, this is not a critical issue.
> 
>>>> ---
>>>>  drivers/md/raid5.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 44 insertions(+), 1 deletion(-)
>>>>
>>>> I'm by no means raid5 expert but this is what I was able to come up with. Any
>>>> opinion or ideas how to fix the problem in a better way?
>>>
>>> The other option would be to revert to the old method for spinning disks
>>> and use the pivot option only on SSDs. The pivot optimization really
>>> only applies at IO speeds that can be achieved by fast solid state disks
>>> anyway, and there isn't likely enough IOPS possible on spinning disks to
>>> get enough lock contention that the optimization would provide any benefit.
>>>
>>> So it could make sense to just fall back to the old method of preparing
>>> the stripes in logical block order if we are running on spinning disks.
>>> Though, that might be a bit more involved than what this patch does. So
>>> I think this is probably a good approach, unless we want to recover more
>>> of the lost throughput.
>>
>> How about we only do the optimization in this patch for spinning disks?
>> Something like:
>>
>>         if (likely(conf->reshape_progress == MaxSector) &&
>>             !blk_queue_nonrot(mddev->queue))
>>                 logical_sector = raid5_bio_lowest_chunk_sector(conf, bi);
> 
> Yeah, makes sense. On SSD disks I was not able to observe any adverse
> effects of the different stripe order. Will you update the patch or should
> I respin it?

Does it make sense? If SSDs work fine with the new stripe order, why do
things different for them? So I don't see a  benefit of making the fix
only work with non-rotational devices. It's my original change which
could be made for non-rotationatial only, but that's much more involved.

Logan



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux