> > > > > Hi, > > > > > > > > > > 在 2024/04/18 13:44, tada keisuke 写道: > > > > > > This patch depends on patch 07. > > > > > > > > > > > > All rdevs running in RAID 1/10 switch nr_pending to atomic > > > > > > mode. The value of nr_pending is read in a normal operation > > > > > > (choose_best_rdev()). Therefore, nr_pending must always be > > > > > > consistent. > > > > > > > > > > > > Signed-off-by: Keisuke TADA <keisuke1.tada@xxxxxxxxxx> > > > > > > Signed-off-by: Toshifumi OHTAKE <toshifumi.ootake@xxxxxxxxxx> > > > > > > --- > > > > > > drivers/md/md.h | 14 ++++++++++++++ > > > > > > drivers/md/raid1.c | 7 +++++++ > > > > > > drivers/md/raid10.c | 4 ++++ > > > > > > 3 files changed, 25 insertions(+) > > > > > > > > > > > > diff --git a/drivers/md/md.h b/drivers/md/md.h > > > > > > index ab09e312c9bb..57b09b567ffa 100644 > > > > > > --- a/drivers/md/md.h > > > > > > +++ b/drivers/md/md.h > > > > > > @@ -236,6 +236,20 @@ static inline unsigned long > > > > > > nr_pending_read(struct md_rdev *rdev) return > > > > > > atomic_long_read(&rdev->nr_pending.data->count); } > > > > > > > > > > > > +static inline bool nr_pending_is_percpu_mode(struct md_rdev > > > > > > *rdev) +{ > > > > > > + unsigned long __percpu *percpu_count; > > > > > > + > > > > > > + return __ref_is_percpu(&rdev->nr_pending, > > > > > > &percpu_count); +} > > > > > > + > > > > > > +static inline bool nr_pending_is_atomic_mode(struct md_rdev > > > > > > *rdev) +{ > > > > > > + unsigned long __percpu *percpu_count; > > > > > > + > > > > > > + return !__ref_is_percpu(&rdev->nr_pending, > > > > > > &percpu_count); +} > > > > > > + > > > > > > static inline int is_badblock(struct md_rdev *rdev, > > > > > > sector_t s, int sectors, sector_t *first_bad, int > > > > > > *bad_sectors) { > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > > > > index 12318fb15a88..c38ae13aadab 100644 > > > > > > --- a/drivers/md/raid1.c > > > > > > +++ b/drivers/md/raid1.c > > > > > > @@ -784,6 +784,7 @@ static int choose_best_rdev(struct r1conf > > > > > > *conf, struct r1bio *r1_bio) if (ctl.readable_disks++ == 1) > > > > > > set_bit(R1BIO_FailFast, > > > > > > &r1_bio->state); > > > > > > > > > > > > + > > > > > > WARN_ON_ONCE(nr_pending_is_percpu_mode(rdev)); pending = > > > > > > nr_pending_read(rdev); dist = abs(r1_bio->sector - > > > > > > conf->mirrors[disk].head_position); > > > > > > @@ -1930,6 +1931,7 @@ static int raid1_add_disk(struct mddev > > > > > > *mddev, struct md_rdev *rdev) if (err) > > > > > > return err; > > > > > > > > > > > > + > > > > > > percpu_ref_switch_to_atomic_sync(&rdev->nr_pending); > > > > > > raid1_add_conf(conf, rdev, mirror, false); /* As all devices > > > > > > are equivalent, we don't need a full recovery > > > > > > * if this was recently any drive > > > > > > of the array @@ -1949,6 +1951,7 @@ static int > > > > > > raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) > > > > > > set_bit(Replacement, &rdev->flags); raid1_add_conf(conf, > > > > > > rdev, repl_slot, true); err = 0; > > > > > > + > > > > > > percpu_ref_switch_to_atomic_sync(&rdev->nr_pending); > > > > > > > > > > I don't understand what's the point here, 'nr_pending' will be > > > > > used when the rdev issuing IO, and it's always used as atomic > > > > > mode, there is no difference. > > > > > > > > > > Consider that 'nr_pending' must be read from IO fast path, use > > > > > it as atomic is something we must accept. Unless someone comes > > > > > up with a plan to avoid reading 'inflight' counter from fast > > > > > path like generic block layer, it's not ok to me to switch to > > > > > percpu_ref for now. > > > > The main purpose of this patchset is to improve RAID5 performance. > > In the current RAID 1/10 design, the value of nr_pending is > > intentionally always in atomic mode because it must be read in IO > > fast path. Unless the design of reading the value of nr_pending has > > changed, I believe that this patchset is a reasonable design and > > RAID1 performance is about the same as atomic_t before this patchset > > was applied. Paul's results also show that. > > > > Best Regards, > > Keisuke > > I only tested RAID1 and do believe that simpler is better so would > prefer not to change the RAID1 code. I can run some RAID5 tests on > this as well unless you have some wide sweeping results? Would love to > see more RAID5 performance improvments. Shushu has another RAID5 perf > patch out there that I think has some very good potential, it would be > good if you could take a look at that one. > > -Paul We are planning to measure the performance of RAID5 using SSDs. Keisuke