On Tue, Feb 27, 2024 at 8:09 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > For raid1, each read will iterate all the rdevs from conf and check if > any rdev is non-rotational, then choose rdev with minimal IO inflight > if so, or rdev with closest distance otherwise. > > Disk nonrot info can be changed through sysfs entry: > > /sys/block/[disk_name]/queue/rotational > > However, consider that this should only be used for testing, and user > really shouldn't do this in real life. Record the number of non-rotational > disks in conf, to avoid checking each rdev in IO fast path and simplify > read_balance() a little bit. > > Co-developed-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx> > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > drivers/md/md.h | 1 + > drivers/md/raid1.c | 17 ++++++++++------- > drivers/md/raid1.h | 1 + > 3 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/md.h b/drivers/md/md.h > index a49ab04ab707..b2076a165c10 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -207,6 +207,7 @@ enum flag_bits { > * check if there is collision between raid1 > * serial bios. > */ > + Nonrot, /* non-rotational device (SSD) */ > }; > > static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors, > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index a145fe48b9ce..0fed01b06de9 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -599,7 +599,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > int sectors; > int best_good_sectors; > int best_disk, best_dist_disk, best_pending_disk; > - int has_nonrot_disk; > int disk; > sector_t best_dist; > unsigned int min_pending; > @@ -620,7 +619,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_pending_disk = -1; > min_pending = UINT_MAX; > best_good_sectors = 0; > - has_nonrot_disk = 0; > choose_next_idle = 0; > clear_bit(R1BIO_FailFast, &r1_bio->state); > > @@ -637,7 +635,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > sector_t first_bad; > int bad_sectors; > unsigned int pending; > - bool nonrot; > > rdev = conf->mirrors[disk].rdev; > if (r1_bio->bios[disk] == IO_BLOCKED > @@ -703,8 +700,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > /* At least two disks to choose from so failfast is OK */ > set_bit(R1BIO_FailFast, &r1_bio->state); > > - nonrot = bdev_nonrot(rdev->bdev); > - has_nonrot_disk |= nonrot; > pending = atomic_read(&rdev->nr_pending); > dist = abs(this_sector - conf->mirrors[disk].head_position); > if (choose_first) { > @@ -731,7 +726,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > * small, but not a big deal since when the second disk > * starts IO, the first disk is likely still busy. > */ > - if (nonrot && opt_iosize > 0 && > + if (test_bit(Nonrot, &rdev->flags) && opt_iosize > 0 && > mirror->seq_start != MaxSector && > mirror->next_seq_sect > opt_iosize && > mirror->next_seq_sect - opt_iosize >= > @@ -763,7 +758,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > * mixed ratation/non-rotational disks depending on workload. > */ > if (best_disk == -1) { > - if (has_nonrot_disk || min_pending == 0) > + if (READ_ONCE(conf->nonrot_disks) || min_pending == 0) > best_disk = best_pending_disk; > else > best_disk = best_dist_disk; > @@ -1819,6 +1814,11 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) > WRITE_ONCE(p[conf->raid_disks].rdev, rdev); > } > > + if (!err && bdev_nonrot(rdev->bdev)) { > + set_bit(Nonrot, &rdev->flags); > + WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1); > + } > + Hi Kuai I noticed raid1_run->setup_conf is used to add rdev to conf when creating raid1. raid1_add_disk is only used for --add/--re-add after creating array. So we need to add the same logic in setup_conf? Regards Xiao > print_conf(conf); > return err; > } > @@ -1883,6 +1883,9 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) > } > abort: > > + if (test_and_clear_bit(Nonrot, &rdev->flags)) > + WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks - 1); > + > print_conf(conf); > return err; > } > diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h > index 14d4211a123a..5300cbaa58a4 100644 > --- a/drivers/md/raid1.h > +++ b/drivers/md/raid1.h > @@ -71,6 +71,7 @@ struct r1conf { > * allow for replacements. > */ > int raid_disks; > + int nonrot_disks; > > spinlock_t device_lock; > > -- > 2.39.2 >