Hi Kuai Thanks for the effort! On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Commit 12cee5a8a29e ("md/raid1: prevent merging too large request") add > the case choose next idle in read_balance(): > > read_balance: > for_each_rdev > if(next_seq_sect == this_sector || disk == 0) typo error: s/disk/dist/g > -> sequential reads > best_disk = disk; > if (...) > choose_next_idle = 1 > continue; > > for_each_rdev > -> iterate next rdev > if (pending == 0) > best_disk = disk; > -> choose the next idle disk > break; > > if (choose_next_idle) > -> keep using this rdev if there are no other idle disk > continue > > However, commit 2e52d449bcec ("md/raid1: add failfast handling for reads.") > remove the code: > > - /* If device is idle, use it */ > - if (pending == 0) { > - best_disk = disk; > - break; > - } > > Hence choose next idle will never work now, fix this problem by > following: > > 1) don't set best_disk in this case, read_balance() will choose the best > disk after iterating all the disks; > 2) add 'pending' so that other idle disk will be chosen; > 3) set 'dist' to 0 so that if there is no other idle disk, and all disks > are rotational, this disk will still be chosen; > > Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") > Co-developed-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > drivers/md/raid1.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index c60ea58ae8c5..d0bc67e6d068 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -604,7 +604,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > unsigned int min_pending; > struct md_rdev *rdev; > int choose_first; > - int choose_next_idle; > > /* > * Check if we can balance. We can balance on the whole > @@ -619,7 +618,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_pending_disk = -1; > min_pending = UINT_MAX; > best_good_sectors = 0; > - choose_next_idle = 0; > clear_bit(R1BIO_FailFast, &r1_bio->state); > > if ((conf->mddev->recovery_cp < this_sector + sectors) || > @@ -712,7 +710,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > int opt_iosize = bdev_io_opt(rdev->bdev) >> 9; > struct raid1_info *mirror = &conf->mirrors[disk]; > > - best_disk = disk; > /* > * If buffered sequential IO size exceeds optimal > * iosize, check if there is idle disk. If yes, choose > @@ -731,15 +728,21 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > mirror->next_seq_sect > opt_iosize && > mirror->next_seq_sect - opt_iosize >= > mirror->seq_start) { > - choose_next_idle = 1; > - continue; > + /* > + * Add 'pending' to avoid choosing this disk if > + * there is other idle disk. > + * Set 'dist' to 0, so that if there is no other > + * idle disk and all disks are rotational, this > + * disk will still be chosen. > + */ > + pending++; > + dist = 0; There is a problem. If all disks are not idle and there is a disk with dist=0 before the seq disk, it can't read from the seq disk. It will read from the first disk with dist=0. Maybe we can only add the codes which are removed from 2e52d449bcec? Best Regards Xiao > + } else { > + best_disk = disk; > + break; > } > - break; > } > > - if (choose_next_idle) > - continue; > - > if (min_pending > pending) { > min_pending = pending; > best_pending_disk = disk; > -- > 2.39.2 > >