Re: [PATCH V3] md-linear: optimize which_dev() for small disks number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2025/02/07 10:35, colyli@xxxxxxxxxx 写道:
From: Coly Li <colyli@xxxxxxxxxx>

which_dev() is a top hot function in md-linear.c, every I/O request will
call it to find out which component disk the bio should be issued to.

Current witch_dev() performs a standard binary search, indeed this is
not the fastest algorithm in practice. When the whole conf->disks array
can be stored within a single cache line, simple linear search is faster
than binary search for a small disks number.

From micro benchmark, around 20%~30% latency reduction can be observed.
Of course such huge optimization cannot be achieved in real workload, in
my benchmark with,
1) One md linear device assembled by 2 or 4 Intel Optane memory block
    device on Lenovo ThinkSystem SR650 server.
2) Random write I/O issued by fio, with I/O depth 1 and 512 bytes block
    size.


The percentage of I/O latencies completed with 750 nsec increases from
97.186% to 99.324% in average, in a rough estimation the write latency
improves (reduces) around 2.138%.

This is quite ideal result, I believe on slow hard drives such small
code-running optimization will be overwhelmed by hardware latency and
hard to be recognized.

This patch will go back to binary search when the linear device grows
and conf->disks array cannot be placed within a single cache line.

Although the optimization result is tiny in most of cases, it is good to
have it since we don't pay any other cost.

Signed-off-by: Coly Li <colyli@xxxxxxxxxx>
Cc: Song Liu <song@xxxxxxxxxx>
Cc: Yu Kuai <yukuai3@xxxxxxxxxx>
---
Changelog,
v3: fix typo and email address which are reported by raid kernel ci.
v2: return last item of conf->disks[] if fast search missed.
v1: initial version.


  drivers/md/md-linear.c | 45 +++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index a382929ce7ba..cdb59f2b2a1c 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -25,10 +25,12 @@ struct linear_conf {
  	struct dev_info         disks[] __counted_by(raid_disks);
  };
+static int prefer_linear_dev_search;

Why using global variable instead of per linear? There are still hole in
the linear_conf() that you can use.
+
  /*
   * find which device holds a particular offset
   */
-static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
+static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector)
  {
  	int lo, mid, hi;
  	struct linear_conf *conf;
@@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
  	return conf->disks + lo;
  }
+/*
+ * If conf->disk[] can be hold within a L1 cache line,
+ * linear search is fater than binary search.
+ */
+static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
+{
+	int i;
+
+	if (prefer_linear_dev_search) {
+		struct linear_conf *conf;
+		struct dev_info *dev;
+		int max;
+
+		conf = mddev->private;
+		dev = conf->disks;
+		max = conf->raid_disks;
+		for (i = 0; i < max; i++, dev++) {
+			if (sector < dev->end_sector)
+				return dev;
+		}
+		return &conf->disks[max-1];
+	}
+
+	/* slow path */
+	return __which_dev(mddev, sector);
+}
+
+
  static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks)
  {
  	struct linear_conf *conf;
@@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
  	md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
  	set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
  	kfree_rcu(oldconf, rcu);
+
+	/*
+	 * When elements in linear_conf->disks[] becomes large enough,
+	 * set prefer_linear_dev_search as 0 to indicate linear search
+	 * in which_dev() is not optimized. Slow path in __which_dev()
+	 * might be faster.
+	 */
+	if ((mddev->raid_disks * sizeof(struct dev_info)) >
+	     cache_line_size() &&

BTW, you said in the case *a single cache line*, however, the field
disks is 32 bytes offset in linear_conf, you might want to align it to
cacheline, or add sizeof(struct linear_conf) on the left and align
linear_conf to the cacheline.

Thanks,
Kuai
+	    prefer_linear_dev_search == 1)
+		prefer_linear_dev_search = 0;
+
  	return 0;
  }
@@ -337,6 +379,7 @@ static struct md_personality linear_personality = { static int __init linear_init(void)
  {
+	prefer_linear_dev_search = 1;
  	return register_md_personality(&linear_personality);
  }





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux