On Mon, Feb 26, 2024 at 5:12 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/02/26 16:55, Xiao Ni 写道: > > Hi Kuai > > > > Thanks for the effort! > > > > On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> From: Yu Kuai <yukuai3@xxxxxxxxxx> > >> > >> Commit 12cee5a8a29e ("md/raid1: prevent merging too large request") add > >> the case choose next idle in read_balance(): > >> > >> read_balance: > >> for_each_rdev > >> if(next_seq_sect == this_sector || disk == 0) > > > > typo error: s/disk/dist/g > > > >> -> sequential reads > >> best_disk = disk; > >> if (...) > >> choose_next_idle = 1 > >> continue; > >> > >> for_each_rdev > >> -> iterate next rdev > >> if (pending == 0) > >> best_disk = disk; > >> -> choose the next idle disk > >> break; > >> > >> if (choose_next_idle) > >> -> keep using this rdev if there are no other idle disk > >> continue > >> > >> However, commit 2e52d449bcec ("md/raid1: add failfast handling for reads.") > >> remove the code: > >> > >> - /* If device is idle, use it */ > >> - if (pending == 0) { > >> - best_disk = disk; > >> - break; > >> - } > >> > >> Hence choose next idle will never work now, fix this problem by > >> following: > >> > >> 1) don't set best_disk in this case, read_balance() will choose the best > >> disk after iterating all the disks; > >> 2) add 'pending' so that other idle disk will be chosen; > >> 3) set 'dist' to 0 so that if there is no other idle disk, and all disks > >> are rotational, this disk will still be chosen; > >> > >> Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") > >> Co-developed-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > >> Signed-off-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > >> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > >> --- > >> drivers/md/raid1.c | 21 ++++++++++++--------- > >> 1 file changed, 12 insertions(+), 9 deletions(-) > >> > >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > >> index c60ea58ae8c5..d0bc67e6d068 100644 > >> --- a/drivers/md/raid1.c > >> +++ b/drivers/md/raid1.c > >> @@ -604,7 +604,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> unsigned int min_pending; > >> struct md_rdev *rdev; > >> int choose_first; > >> - int choose_next_idle; > >> > >> /* > >> * Check if we can balance. We can balance on the whole > >> @@ -619,7 +618,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> best_pending_disk = -1; > >> min_pending = UINT_MAX; > >> best_good_sectors = 0; > >> - choose_next_idle = 0; > >> clear_bit(R1BIO_FailFast, &r1_bio->state); > >> > >> if ((conf->mddev->recovery_cp < this_sector + sectors) || > >> @@ -712,7 +710,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> int opt_iosize = bdev_io_opt(rdev->bdev) >> 9; > >> struct raid1_info *mirror = &conf->mirrors[disk]; > >> > >> - best_disk = disk; > >> /* > >> * If buffered sequential IO size exceeds optimal > >> * iosize, check if there is idle disk. If yes, choose > >> @@ -731,15 +728,21 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> mirror->next_seq_sect > opt_iosize && > >> mirror->next_seq_sect - opt_iosize >= > >> mirror->seq_start) { > >> - choose_next_idle = 1; > >> - continue; > >> + /* > >> + * Add 'pending' to avoid choosing this disk if > >> + * there is other idle disk. > >> + * Set 'dist' to 0, so that if there is no other > >> + * idle disk and all disks are rotational, this > >> + * disk will still be chosen. > >> + */ > >> + pending++; > >> + dist = 0; > > > > There is a problem. If all disks are not idle and there is a disk with > > dist=0 before the seq disk, it can't read from the seq disk. It will > > read from the first disk with dist=0. Maybe we can only add the codes > > which are removed from 2e52d449bcec? > > If there is a disk with disk=0, then best_dist_disk will be updated to > the disk, and best_dist will be updated to 0 already: > > // in each iteration > if (dist < best_dist) { > best_dist = dist; > btest_disk_disk = disk; > } > > In this case, best_dist will be set to the first disk with dist=0, and > at last, the disk will be chosen: > > if (best_disk == -1) { > if (has_nonrot_disk || min_pending == 0) > best_disk = best_pending_disk; > else > best_disk = best_dist_disk; > -> the first disk with dist=0; > } > > So, the problem that you concerned should not exist. Hi Kuai Thanks for the explanation. You're right. It chooses the first disk which has dist==0. In the above, you made the same typo error disk=0 as the comment. I guess you want to use dist=0, right? Beside this, this patch is good to me. Best Regards Xiao > > Thanks, > Kuai > > > > Best Regards > > Xiao > > > >> + } else { > >> + best_disk = disk; > >> + break; > >> } > >> - break; > >> } > >> > >> - if (choose_next_idle) > >> - continue; > >> - > >> if (min_pending > pending) { > >> min_pending = pending; > >> best_pending_disk = disk; > >> -- > >> 2.39.2 > >> > >> > > > > . > > >