> 2025年2月7日 17:50,Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> 写道: > > Hi, > > 在 2025/02/07 10:35, colyli@xxxxxxxxxx 写道: >> From: Coly Li <colyli@xxxxxxxxxx> >> which_dev() is a top hot function in md-linear.c, every I/O request will >> call it to find out which component disk the bio should be issued to. >> Current witch_dev() performs a standard binary search, indeed this is >> not the fastest algorithm in practice. When the whole conf->disks array >> can be stored within a single cache line, simple linear search is faster >> than binary search for a small disks number. >>> From micro benchmark, around 20%~30% latency reduction can be observed. >> Of course such huge optimization cannot be achieved in real workload, in >> my benchmark with, >> 1) One md linear device assembled by 2 or 4 Intel Optane memory block >> device on Lenovo ThinkSystem SR650 server. >> 2) Random write I/O issued by fio, with I/O depth 1 and 512 bytes block >> size. > >> The percentage of I/O latencies completed with 750 nsec increases from >> 97.186% to 99.324% in average, in a rough estimation the write latency >> improves (reduces) around 2.138%. >> This is quite ideal result, I believe on slow hard drives such small >> code-running optimization will be overwhelmed by hardware latency and >> hard to be recognized. >> This patch will go back to binary search when the linear device grows >> and conf->disks array cannot be placed within a single cache line. >> Although the optimization result is tiny in most of cases, it is good to >> have it since we don't pay any other cost. >> Signed-off-by: Coly Li <colyli@xxxxxxxxxx> >> Cc: Song Liu <song@xxxxxxxxxx> >> Cc: Yu Kuai <yukuai3@xxxxxxxxxx> >> --- >> Changelog, >> v3: fix typo and email address which are reported by raid kernel ci. >> v2: return last item of conf->disks[] if fast search missed. >> v1: initial version. >> drivers/md/md-linear.c | 45 +++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 44 insertions(+), 1 deletion(-) >> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c >> index a382929ce7ba..cdb59f2b2a1c 100644 >> --- a/drivers/md/md-linear.c >> +++ b/drivers/md/md-linear.c >> @@ -25,10 +25,12 @@ struct linear_conf { >> struct dev_info disks[] __counted_by(raid_disks); >> }; >> +static int prefer_linear_dev_search; > > Why using global variable instead of per linear? There are still hole in > the linear_conf() that you can use. OK, I will use space inside linear_conf. >> + >> /* >> * find which device holds a particular offset >> */ >> -static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> +static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector) >> { >> int lo, mid, hi; >> struct linear_conf *conf; >> @@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> return conf->disks + lo; >> } >> +/* >> + * If conf->disk[] can be hold within a L1 cache line, >> + * linear search is fater than binary search. >> + */ >> +static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> +{ >> + int i; >> + >> + if (prefer_linear_dev_search) { >> + struct linear_conf *conf; >> + struct dev_info *dev; >> + int max; >> + >> + conf = mddev->private; >> + dev = conf->disks; >> + max = conf->raid_disks; >> + for (i = 0; i < max; i++, dev++) { >> + if (sector < dev->end_sector) >> + return dev; >> + } >> + return &conf->disks[max-1]; >> + } >> + >> + /* slow path */ >> + return __which_dev(mddev, sector); >> +} >> + >> + >> static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks) >> { >> struct linear_conf *conf; >> @@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) >> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >> set_capacity_and_notify(mddev->gendisk, mddev->array_sectors); >> kfree_rcu(oldconf, rcu); >> + >> + /* >> + * When elements in linear_conf->disks[] becomes large enough, >> + * set prefer_linear_dev_search as 0 to indicate linear search >> + * in which_dev() is not optimized. Slow path in __which_dev() >> + * might be faster. >> + */ >> + if ((mddev->raid_disks * sizeof(struct dev_info)) > >> + cache_line_size() && > > BTW, you said in the case *a single cache line*, however, the field > disks is 32 bytes offset in linear_conf, you might want to align it to > cacheline, or add sizeof(struct linear_conf) on the left and align > linear_conf to the cacheline. > Yes, thanks for the hint. Let me do it in next version. > Thanks, > Kuai >> + prefer_linear_dev_search == 1) >> + prefer_linear_dev_search = 0; >> + >> return 0; >> } >> @@ -337,6 +379,7 @@ static struct md_personality linear_personality = { >> static int __init linear_init(void) >> { >> + prefer_linear_dev_search = 1; >> return register_md_personality(&linear_personality); >> }