Hi,
I just take a quick look, a design problem below. There are some style
problems, however, we can work on that later.
在 2024/06/06 0:53, Shushu Yi 写道:
Optimize scattered address space. Achieves significant improvements in
both throughput and latency.
Maximize thread-level parallelism and reduce CPU suspension time caused
by lock contention. Achieve increased overall storage throughput by
89.4% and decrease the 99.99th percentile I/O latency by 85.4% on a
system with four PCIe 4.0 SSDs. (Set the iodepth to 32 and employ
libaio. Configure the I/O size as 4 KB with sequential write and 16
threads. In RAID5 consist of 2+1 1TB Samsung 980Pro SSDs, throughput
went from 5218MB/s to 9884MB/s.)
Note: Publish this work as a paper and provide the URL
(https://www.hotstorage.org/2022/camera-ready/hotstorage22-5/pdf/
hotstorage22-5.pdf).
Co-developed-by: Yiming Xu <teddyxym@xxxxxxxxxxx>
Signed-off-by: Yiming Xu <teddyxym@xxxxxxxxxxx>
Signed-off-by: Shushu Yi <firnyee@xxxxxxxxx>
Tested-by: Paul Luse <paul.e.luse@xxxxxxxxx>
---
V1 -> V2: Cleaned up coding style and divided into 2 patches (HemiRAID
and ScalaRAID corresponding to the paper mentioned above). ScalaRAID
equipped every counter with a counter lock and employ our D-Block.
HemiRAID increased the number of stripe locks to 128
This is still just one patch.
V2 -> V3: Adjusted the language used in the subject and changelog.
Since patch 1/2 in V2 cannot be used independently and does not
encompass all of our work, it has been merged into a single patch.
V3 -> V4: Fixed incorrect sending address and changelog format.
V4 -> V5: Resolved a adress conflict on main (commit
f0e729af2eb6bee9eb58c4df1087f14ebaefe26b (HEAD -> md-6.10, tag:
md-6.10-20240502, origin/md-6.10)).
drivers/md/md-bitmap.c | 197 ++++++++++++++++++++++++++++++-----------
drivers/md/md-bitmap.h | 12 ++-
drivers/md/raid5.h | 7 +-
So, you should split changes to md-bitmap and send a patch seperately,
and probably tests raid1/raid10 as well.
3 files changed, 155 insertions(+), 61 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
...
+ /* initialize bmc locks */
+ num_bmclocks = DIV_ROUND_UP(chunks, BITMAP_COUNTER_LOCK_RATIO);
+
+ new_bmclocks = kvcalloc(num_bmclocks, sizeof(*new_bmclocks), GFP_KERNEL);
Can you give a calculation result for additional memory overhead here,
especially when CONFIG_DEBUG_LOCK_ALLOC and CONFIG_DEBUG_SPINLOCK are
enabled/disabled, and mention that in commit message. The
BITMAP_COUNTER_LOCK_RATIO is set to 1, so I suspect this can be
acceptable. You probably must choose an acceptable value based on
chunks.
And please notice that if the above configs are disabled, spinlock
is 4 bytes, and multiple locks will be put in the same cacheline,
and this is meaningless because lock contention for these locks are
the same, due to false sharing.
Thanks,
Kuai