On Tue, Feb 27, 2024 at 8:09 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Commit 12cee5a8a29e ("md/raid1: prevent merging too large request") add > the case choose next idle in read_balance(): > > read_balance: > for_each_rdev > if(next_seq_sect == this_sector || dist == 0) > -> sequential reads > best_disk = disk; > if (...) > choose_next_idle = 1 > continue; > > for_each_rdev > -> iterate next rdev > if (pending == 0) > best_disk = disk; > -> choose the next idle disk > break; > > if (choose_next_idle) > -> keep using this rdev if there are no other idle disk > contine > > However, commit 2e52d449bcec ("md/raid1: add failfast handling for reads.") > remove the code: > > - /* If device is idle, use it */ > - if (pending == 0) { > - best_disk = disk; > - break; > - } > > Hence choose next idle will never work now, fix this problem by > following: > > 1) don't set best_disk in this case, read_balance() will choose the best > disk after iterating all the disks; > 2) add 'pending' so that other idle disk will be chosen; > 3) add a new local variable 'sequential_disk' to record the disk, and if > there is no other idle disk, 'sequential_disk' will be chosen; > > Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") > Co-developed-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > drivers/md/raid1.c | 32 ++++++++++++++++++++++---------- > 1 file changed, 22 insertions(+), 10 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 0fed01b06de9..fc5899fb08c1 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -598,13 +598,12 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > const sector_t this_sector = r1_bio->sector; > int sectors; > int best_good_sectors; > - int best_disk, best_dist_disk, best_pending_disk; > + int best_disk, best_dist_disk, best_pending_disk, sequential_disk; > int disk; > sector_t best_dist; > unsigned int min_pending; > struct md_rdev *rdev; > int choose_first; > - int choose_next_idle; > > /* > * Check if we can balance. We can balance on the whole > @@ -615,11 +614,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > sectors = r1_bio->sectors; > best_disk = -1; > best_dist_disk = -1; > + sequential_disk = -1; > best_dist = MaxSector; > best_pending_disk = -1; > min_pending = UINT_MAX; > best_good_sectors = 0; > - choose_next_idle = 0; > clear_bit(R1BIO_FailFast, &r1_bio->state); > > if ((conf->mddev->recovery_cp < this_sector + sectors) || > @@ -712,7 +711,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > int opt_iosize = bdev_io_opt(rdev->bdev) >> 9; > struct raid1_info *mirror = &conf->mirrors[disk]; > > - best_disk = disk; > /* > * If buffered sequential IO size exceeds optimal > * iosize, check if there is idle disk. If yes, choose > @@ -731,15 +729,22 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > mirror->next_seq_sect > opt_iosize && > mirror->next_seq_sect - opt_iosize >= > mirror->seq_start) { > - choose_next_idle = 1; > - continue; > + /* > + * Add 'pending' to avoid choosing this disk if > + * there is other idle disk. > + */ > + pending++; > + /* > + * If there is no other idle disk, this disk > + * will be chosen. > + */ > + sequential_disk = disk; > + } else { > + best_disk = disk; > + break; > } > - break; > } > > - if (choose_next_idle) > - continue; > - > if (min_pending > pending) { > min_pending = pending; > best_pending_disk = disk; > @@ -751,6 +756,13 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > } > } > > + /* > + * sequential IO size exceeds optimal iosize, however, there is no other > + * idle disk, so choose the sequential disk. > + */ > + if (best_disk == -1 && min_pending != 0) > + best_disk = sequential_disk; > + > /* > * If all disks are rotational, choose the closest disk. If any disk is > * non-rotational, choose the disk with less pending request even the > -- > 2.39.2 > Hi all This patch looks good to me. Reviewed-by: Xiao Ni <xni@xxxxxxxxxx>