Re: [RFC V5] md/raid5: optimize Scattered Address Space and Thread-Level Parallelism to improve RAID5 performance

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Tue, 11 Jun 2024 19:59:20 +0800

Hi,

I just take a quick look, a design problem below. There are some style
problems, however, we can work on that later.

在 2024/06/06 0:53, Shushu Yi 写道:
Optimize scattered address space. Achieves significant improvements in
both throughput and latency.

Maximize thread-level parallelism and reduce CPU suspension time caused
by lock contention. Achieve increased overall storage throughput by
89.4% and decrease the 99.99th percentile I/O latency by 85.4% on a
system with four PCIe 4.0 SSDs. (Set the iodepth to 32 and employ
libaio. Configure the I/O size as 4 KB with sequential write and 16
threads. In RAID5 consist of 2+1 1TB Samsung 980Pro SSDs, throughput
went from 5218MB/s to 9884MB/s.)

Note: Publish this work as a paper and provide the URL
(https://www.hotstorage.org/2022/camera-ready/hotstorage22-5/pdf/
hotstorage22-5.pdf).

Co-developed-by: Yiming Xu <teddyxym@xxxxxxxxxxx>
Signed-off-by: Yiming Xu <teddyxym@xxxxxxxxxxx>
Signed-off-by: Shushu Yi <firnyee@xxxxxxxxx>
Tested-by: Paul Luse <paul.e.luse@xxxxxxxxx>
---
V1 -> V2: Cleaned up coding style and divided into 2 patches (HemiRAID
and ScalaRAID corresponding to the paper mentioned above). ScalaRAID
equipped every counter with a counter lock and employ our D-Block.
HemiRAID increased the number of stripe locks to 128

This is still just one patch.
V2 -> V3: Adjusted the language used in the subject and changelog.
Since patch 1/2 in V2 cannot be used independently and does not
encompass all of our work, it has been merged into a single patch.
V3 -> V4: Fixed incorrect sending address and changelog format.
V4 -> V5: Resolved a adress conflict on main (commit
f0e729af2eb6bee9eb58c4df1087f14ebaefe26b (HEAD -> md-6.10, tag:
md-6.10-20240502, origin/md-6.10)).

  drivers/md/md-bitmap.c | 197 ++++++++++++++++++++++++++++++-----------
  drivers/md/md-bitmap.h |  12 ++-
  drivers/md/raid5.h     |   7 +-

So, you should split changes to md-bitmap and send a patch seperately,
and probably tests raid1/raid10 as well.
  3 files changed, 155 insertions(+), 61 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
...

+	/* initialize bmc locks */
+	num_bmclocks = DIV_ROUND_UP(chunks, BITMAP_COUNTER_LOCK_RATIO);
+
+	new_bmclocks = kvcalloc(num_bmclocks, sizeof(*new_bmclocks), GFP_KERNEL);

Can you give a calculation result for additional memory overhead here,
especially when CONFIG_DEBUG_LOCK_ALLOC and CONFIG_DEBUG_SPINLOCK are
enabled/disabled, and mention that in commit message. The
BITMAP_COUNTER_LOCK_RATIO is set to 1, so I suspect this can be
acceptable. You probably must choose an acceptable value based on
chunks.

And please notice that if the above configs are disabled, spinlock
is 4 bytes, and multiple locks will be put in the same cacheline,
and this is meaningless because lock contention for these locks are
the same, due to false sharing.

Thanks,
Kuai