Re: [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 14-08-23 20:15:58, Yu Kuai wrote:
> Hi,
> 
> 在 2023/08/14 17:27, Jan Kara 写道:
> > Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
> > things changed how bio that needs to be split is submitted. Before this
> > commit, we have split the bio, mapped and submitted each part. After
> > this commit, we map only the first part of the split bio and submit the
> > second part unmapped. Due to bio sorting in __submit_bio_noacct() this
> > results in the following request ordering:
> > 
> >    9,0   18     1181     0.525037895 15995  Q  WS 1479315464 + 63392
> > 
> >    Split off chunk-sized (1024 sectors) request:
> > 
> >    9,0   18     1182     0.629019647 15995  X  WS 1479315464 / 1479316488
> > 
> >    Request is unaligned to the chunk so it's split in
> >    raid0_make_request().  This is the first part mapped and punted to
> >    bio_list:
> > 
> >    8,0   18     7053     0.629020455 15995  A  WS 739921928 + 1016 <- (9,0) 1479315464
> > 
> >    Now raid0_make_request() returns, second part is postponed on
> >    bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
> >    is submitted to the underlying device:
> > 
> >    8,0   18     7054     0.629022782 15995  G  WS 739921928 + 1016
> > 
> >    Now we take another request from the bio_list which is the remainder
> >    of the original huge request. Split off another chunk-sized bit from
> >    it and the situation repeats:
> > 
> >    9,0   18     1183     0.629024499 15995  X  WS 1479316488 / 1479317512
> >    8,16  18     6998     0.629025110 15995  A  WS 739921928 + 1016 <- (9,0) 1479316488
> >    8,16  18     6999     0.629026728 15995  G  WS 739921928 + 1016
> >    ...
> >    9,0   18     1184     0.629032940 15995  X  WS 1479317512 / 1479318536 [libnetacq-write]
> >    8,0   18     7059     0.629033294 15995  A  WS 739922952 + 1016 <- (9,0) 1479317512
> >    8,0   18     7060     0.629033902 15995  G  WS 739922952 + 1016
> >    ...
> > 
> >    This repeats until we consume the whole original huge request. Now we
> >    finally get to processing the second parts of the split off requests
> >    (in reverse order):
> > 
> >    8,16  18     7181     0.629161384 15995  A  WS 739952640 + 8 <- (9,0) 1479377920
> >    8,0   18     7239     0.629162140 15995  A  WS 739952640 + 8 <- (9,0) 1479376896
> >    8,16  18     7186     0.629163881 15995  A  WS 739951616 + 8 <- (9,0) 1479375872
> >    8,0   18     7242     0.629164421 15995  A  WS 739951616 + 8 <- (9,0) 1479374848
> >    ...
> > 
> > I guess it is obvious that this IO pattern is extremely inefficient way
> > to perform sequential IO. It also makes bio_list to grow to rather long
> > lengths.
> > 
> > Change raid0_make_request() to map both parts of the split bio. Since we
> > know we are provided with at most chunk-sized bios, we will always need
> > to split the incoming bio at most once.
> 
> I understand the problem now, but I'm lost here, can you explain why "at
> most once" more ? Do you mean that the original bio won't be splited
> again?

Yes. md_submit_bio() splits the incoming bio by bio_split_to_limits() so we
are guaranteed raid0_make_request() gets bio at most chunk_sectors long.
If this bio is misaligned, it will be split in raid0_make_request(). But
after that we are sure both parts of the bio are meeting requirements of
the raid0 code so we can just directly map them to the underlying device and
submit them.

								Honza

> > Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >   drivers/md/raid0.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > index d3c55f2e9b18..595856948dff 100644
> > --- a/drivers/md/raid0.c
> > +++ b/drivers/md/raid0.c
> > @@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> >   		struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> >   					      &mddev->bio_set);
> >   		bio_chain(split, bio);
> > -		submit_bio_noacct(bio);
> > +		raid0_map_submit_bio(mddev, bio);
> >   		bio = split;
> >   	}
> > 
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux