On Wed 19-04-23 22:26:07, Song Liu wrote: > On Wed, Apr 19, 2023 at 12:04 PM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > On 2023-04-17 11:15, Jan Kara wrote: > > > Commit 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") changed the > > > order in which requests for underlying disks are created. Since for > > > large sequential IO adding of requests frequently races with md_raid5 > > > thread submitting bios to underlying disks, this results in a change in > > > IO pattern because intermediate states of new order of request creation > > > result in more smaller discontiguous requests. For RAID5 on top of three > > > rotational disks our performance testing revealed this results in > > > regression in write throughput: > > > > > > iozone -a -s 131072000 -y 4 -q 8 -i 0 -i 1 -R > > > > > > before 7e55c60acfbb: > > > KB reclen write rewrite read reread > > > 131072000 4 493670 525964 524575 513384 > > > 131072000 8 540467 532880 512028 513703 > > > > > > after 7e55c60acfbb: > > > KB reclen write rewrite read reread > > > 131072000 4 421785 456184 531278 509248 > > > 131072000 8 459283 456354 528449 543834 > > > > > > To reduce the amount of discontiguous requests we can start generating > > > requests with the stripe with the lowest chunk offset as that has the > > > best chance of being adjacent to IO queued previously. This improves the > > > performance to: > > > KB reclen write rewrite read reread > > > 131072000 4 497682 506317 518043 514559 > > > 131072000 8 514048 501886 506453 504319 > > > > > > restoring big part of the regression. > > > > > > Fixes: 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > > > Looks good to me. I ran it through some of the functional tests I used > > to develop the patch in question and found no issues. > > > > Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> > > Thanks Jan and Logan! I will apply this to md-next. But it may not make > 6.4 release, as we are already at rc7. OK, sure, this is not a critical issue. > > > --- > > > drivers/md/raid5.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- > > > 1 file changed, 44 insertions(+), 1 deletion(-) > > > > > > I'm by no means raid5 expert but this is what I was able to come up with. Any > > > opinion or ideas how to fix the problem in a better way? > > > > The other option would be to revert to the old method for spinning disks > > and use the pivot option only on SSDs. The pivot optimization really > > only applies at IO speeds that can be achieved by fast solid state disks > > anyway, and there isn't likely enough IOPS possible on spinning disks to > > get enough lock contention that the optimization would provide any benefit. > > > > So it could make sense to just fall back to the old method of preparing > > the stripes in logical block order if we are running on spinning disks. > > Though, that might be a bit more involved than what this patch does. So > > I think this is probably a good approach, unless we want to recover more > > of the lost throughput. > > How about we only do the optimization in this patch for spinning disks? > Something like: > > if (likely(conf->reshape_progress == MaxSector) && > !blk_queue_nonrot(mddev->queue)) > logical_sector = raid5_bio_lowest_chunk_sector(conf, bi); Yeah, makes sense. On SSD disks I was not able to observe any adverse effects of the different stripe order. Will you update the patch or should I respin it? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR